The most important data structures.
Linked data structures means the ones that involve pointers: lists, trees, as opposed to arrays, heaps, matrices and hash tables.
Each element has a function applied that maps it to array index. Looking up an index in an array is fast. Of course two elements can map to the same index, that’s why:
The example function for strings is treating the string as a number in a system of base n, where the n is the length of the alphabet:
H(“hello”) = n^5code(‘h’) + n^4code(‘e’) + … + n*code(‘o’)
Then we have to apply modulo (arbitrary) m, as otherwise the number would be too big. This article says why in Java it is 31
.
Hashing can be used for string matching - e.g. Rabin Karp algorithm that hashes each next segment using the information of the has of the previous one, and running in up to Θ(n + m), instead of the naive Θ(nm).
It works on elements that have priorities assigned. Priority queue has the two operations available:
It is most often implemented using heap, but can also be implemented using simple array, or binary tree.
A heap is a tree but arranged differently than the binary tree. It is normally stored in an array.
heap property - if A is parent of B it is ordered with respect to B with the same ordering across the heap
In the representing array the children of a node at position n
are at positions 2n
and 2n + 1
.
Most important operations:
Heap sort (O(n*lon(n)
)) is actually a form of selection sort but with a better data structure.
Searching in a heap is very ineffective.
For every element x
the following holds: all elements of right subtree of x
are >x
and all elements of the left subtree are <=x
. Watch out: not only immediate children matter, but all the descendants! immediate children can meet this criterion but the grandchildren not anymore.
Operations:
O(n)
, space complexity is O(height)
- for the recursion stack, even though we call the recursive function twice, it cannot be more than O(n)
as the max height is n
):
Note that insertions/deletions can create unbalanced trees which are no longer so optimal - that’s why data randomness is often desired. An example of always balanced trees are the red-black trees.
In a complete tree (all nodes have 2 children or are leaves) the number of nodes is 2^n -1
.
Balanced binary tree is such that its right and left subtrees’ heights differ by at most 1. AVL tree is an example of a balanced binary tree.
In AVL tree each node has assigned “balance” which is the height of the left subtree minus the height of the right subtree.
After each insert the tree is re-balanced: we rotate - left or right. If the child and grandchild in the unbalancing subtree are on the same side we need to rotate only once, if they are on opposite sides (left and right, or right and left) - we need to rotate twice, in opposite directions.
Graph consists of set of vertices and edges. They can be:
n!/2(n-2)
= “n over 2” (combinations)Graph can be stored in 2 ways:
0
s and 1
s - good for lookup and updatesIn adjacency list the order of elements in the list does not matter. To store a weighted graph in an adjacency list we simply add the information about the node’s weight in the vertex object, or keep a two element array instead.
We check the node, then we check each of its children, then for each child we check the grandchildren, and so on. Implemented using a queue.
Linear complexity O(n+m)
on both directed and undirected graph (m
- number of edges, n
- number of vertices), if we care about the edges. If we don’t then it is O(n)
.
Applications:
x
and y
(in an unweighted graph) - for that we need to do the traversal starting from x
; once we get to y
we backtrack to x
, provided for each node we remembered who the parent isWe check the node, then we check the first child, then the grandchild, and so on, watching out not to collide with an already visited node. Implemented using stack or recursively.
Complexity is same as for BFT.
Stack implementation is a bit tricky compared to the DFT queue, as the pushing order differs from popping order:
Applications:
It divides edges into tree and back edges (the one that point back to the tree) - don’t get it
Comments: