Like most self-taught programmers, I spent a lot of my years dabbling with different technologies. But there comes a time in every programmer’s life when they have to learn data structures and algorithms to proceed in their careers. In this post, I will go through the basics of data structures, what purpose they serve, and what’s common between all of them.
A data structure is a particular way of organizing real-life data in a computer so that it can be used effectively.
To understand data structures, we must first understand data. In very programming language we have some data types. In some programming languages, we have to explicitly express the data type by using keywords like
Primitive Data Types and Memory
We must understand that for every data type the operating system allocated a specific amount of memory in RAM. Unlike our decimal number system, in which ones place can hold 10 distinct values (0 to 9), in the binary number system, ones place can only hold only 2 distinct values (0 and 1). One binary digit is called a bit (binary + digit). We store 8 bits together and call it a byte.
Because a bit can hold 2 values, 0 or 1, you can calculate the number of possible values by calculating 2^n where n is the number of bits. Given 8 bits per byte, a short integer which is allocated 2 bytes can store 2^16 (65,536) possible 0 and 1 combinations. If we split those between negative and positive values, the data range for a short is -32,768 to +32,767.
You can easily find how much a primitive data type takes in your favorite language, for Go, here is a document with such table: https://www.tutorialspoint.com/go/go_data_types.htm
Data Structure and Why We Study Them
Data structures are different from the primitive data types, but they are similar to them in the sense that it holds (structures) same data into certain predefined manners.
When we study data structure and algorithm, we are talking (and concern) about space (RAM) and processing (CPU) efficiency. Both processing power and memory are finite. So we need to keep things as optimized so that it takes least amount of space and processing time. You won’t want your Lambda function to be running forever, would you? You would want it to take the least amount of time and take least amount of space, right?
Types of Data Structures
Our RAM is basically cells of memory as seen in below image.
Linear Data Structures
- An array is a fundamentally a list of similar values.
- Elements of an array are stored in contagious memory blocks.
- All the elements have to have same data types.
- Arrays have a fixed length.
As you can see in the image, there is a section of RAM where all the data is stored in contagious memory block.
So Python list is not truly an array, but regarded as one. Dealing with similar value is always faster, that’s the reason you must have seen numpy using similar data type for defining new array.
- Unlike arrays, a linked list is a data structure, in which the elements are not stored at contiguous memory locations.
- In a linked list data structure, we generally have a payload variable, one or two pointer variables that point to the next element in the list.
- Being not stored in contagious memory, it gives flexibility to allocated memory anywhere in RAM. Which is beneficial and can leverage available space smartly than arrays.
- Variations of Linked list are singly liked list, doubly linked list, circular linked list. Singly linked list only has one pointer to the next element but the doubly linked list has a pointer to the next as well as the previous element.
- Circular linked list are used when scheduling jobs for the processor, if one job is waiting for IO or network resource, the next jobs comes in.
Now as you have seen in the case of arrays, each element of a linked list can be anywhere in the RAM.
There is a downside to too, you can’t access any item straight away in linked list as you would in an array.
- Queue is a data structure which can be related to a queue of students in a morning assembly. If entire queue has to go to the class, the first student in the queue has to move first; then follows the other students.
- Similarly, in queue, we can add data to the last of the queue and remove from the begining.
- Queues are also called FIFO (First In First Out) data structure. As seen from the perspective of element (data), the data first gets in, then the first data gets out.
- There have been software written just to manage task queues, which are highly used in distributed computing. RabbitMQ is one such example I have seen in wild.
Does StackOverflow or stacktraces strike something in your mind? Yes, they are both related to same stack data structure.
Stack can be related to a stack of plates, to first add something to a stack you put the element at the top, when removing, you remove from the top. If you have to access the n-1 element, you have to first access the all the intermediate elements.
Non-Linear Data Structures
- Heard about Graph Theory from discrete mathematics? Yes, they are the same. If you haven’t, go watch some youtube videos.
- A graph consists of nodes and vertices.
There is a plethora of algorithms related to this data structures which I will discuss in upcoming posts. Below is one of the first videos I watched on Graph Theory.
- Tree is a kind of a graph.
- There are many variations of particular data structure, like there multiple kinds of trees, including binary tree, AVL tree, red-black tree etc.
This is by no means an exhaustive list of data structures, some data structures can be created by mixing them together, hash map is an example of one such data structures which is known to be very efficient. On top of that, some data structures can be implemented by using other data types e.g. stacks and queues can be implemented by using array, or a linked list.
Every data structure has their own significance, and there is best one.
In this post I’ll not dig into manipulating these data structures and this will be reserved for some other post in future in which I’ll cover algorithms.
Operations of Data Structures
There are some generic operations we could do on a number of data structures. Let’s look at them one by one.
We can add a new item to an array of predefined length, we simply call them inserting. Push terminology is generally used when we add to the last of a data structure, like in stacks.
There can be multiple variations of the insert: add, addFirst, addLast, addByIndex
Variations depend upon the nature of data structure. Like, you may insert at first of a linked list or array, but you may not in a stack.
- Variations can include: getByIndex.
- You can’t get by index in a linked list.
Searching could be different depending on the algorithm we use. We can search an array in a linear fashion, but we can also use something called binary search where we eliminate half of the array by bisecting the array depending on the value we are searching for.
Prerequisite for binary search is the data structures has to be sorted.
When we remove an element from last of an structure, we call it popping. In case of queues, it’s called dequeue.
Variations may include: deleteFirst, deleteLast, deleteByIndex
- There is certain time taken by each of the operations, which I will discuss in a post where I discuss algorithms.
Where to Learn and Practice Data Structures
I am by no means a guru of DSA, but I have recently come up across this resource by freeCoceCamp.
Finally, this is not directly related to DSA, but having a Math background is always good and helps us in analytical thinking. For sake of that, if you don’t have a mathematical background, I would recommend solving problems from Project Euler. Going through the process of solving the problem will force you to look for a better optimized alternative which can only be done by researching on the topic.
If you found this post informative, please feel free to follow me on Twitter for related updates and send me a connection request on LinkedIn.
If you found something missing, or wrong, please let me know in comments so that everybody stays on the same page.