Choosing data structures in different languages

Do you know about linked lists ?

As a Python programmer, it’s a data structure that I rarely see in the wild. Why is that?

In this post, I’ll show you an example where using an array is simpler than using a linked list in Python when compared to C.

The problem

We have a program that will accept lines of input containing 10 integers separated by a space. For example 1 2 3 4 5 6 7 8 9 10.

The program needs to group the lines that have the same sum together. For example, lines 1 2 3 4 5 6 7 8 9 10 and 10 2 3 4 5 6 7 8 9 1 have the same sum and will be grouped together.

In C

Our strategy is to use a hash table where each key will contain a linked list.

Here’s the linked list:

struct node {
  int line[10];
  struct node *next;
};

Then the main program will need to read the input, allocate memory, and store linked lists in a hash table:

#include <stdio.h>
#include <stdlib.h>

#define N 20
#define MAX_SUM 200000

...

int get_sum(int *line) {
  int i;
  int sum = 0;
  for (i = 0; i < 10; i++) {
    sum += *(line + i);
  }
  return sum;
}

int main(void) {
  int i, j;
  struct node *nodes[MAX_SUM] = {NULL};
  struct node *line;

  for (i = 0;  i < N; i++) {
    line = malloc(sizeof(struct node));

      for (j = 0; j < 10; j++) {
        scanf("%d", &line->line[j]);
      }
      sum = get_sum(&line->line[0]);
      line->next = nodes[sum];
      nodes[sum] = line;
  }
}

As you can see, we’re defining the max sum in order to create a hash table that’s large enough because sums will be used as keys in our hash table and linked lists as our buckets.

It’s a simple solution. But how would we solve this in a language like Python where dictionaries and nested arrays are available?

In Python

The Python solution is much simpler:

N =  20

lines = {}

for _ in range(N):
  line = input().strip().split()
  line = [int(i) for i in line]

  s = sum(line)
  if lines.get(s) is None:
    lines[s] = []

  lines[s].append(line)

No need to define a max sum or anything because dictionary sizing is handled automatically.

We can even use an array instead of a dictionary since we’re using integers for indexing.

No need to build a linked list either because Python has great matrix support out of the box. You can nest arrays as deeply as you want without having to worry about how it’s represented in memory.

Performance and Conclusion

Obviously, the Python solution uses more memory and CPU time because we’re letting the runtime handle everything for us whereas in C, we’re building everything from scratch.

To conclude, it means that the right data structure in one language might not be the best in another one.

Using a linked list in the Python solution might increase our memory usage and decrease performance whereas in C, it’s the best data structure to use.