Data structures

Posted by Monik, 20 February 2017.
Programming Algorithms

The most important data structures.

Table of contents

Linked data structures means the ones that involve pointers: lists, trees, as opposed to arrays, heaps, matrices and hash tables.

Hash table

Each element has a function applied that maps it to array index. Looking up an index in an array is fast. Of course two elements can map to the same index, that’s why:

The example function for strings is treating the string as a number in a system of base n, where the n is the length of the alphabet:

H(“hello”) = n^5code(‘h’) + n^4code(‘e’) + … + n*code(‘o’)

Then we have to apply modulo (arbitrary) m, as otherwise the number would be too big. This article says why in Java it is 31.

Hashing can be used for string matching - e.g. Rabin Karp algorithm that hashes each next segment using the information of the has of the previous one, and running in up to Θ(n + m), instead of the naive Θ(nm).

Priority queue

It works on elements that have priorities assigned. Priority queue has the two operations available:

It is most often implemented using heap, but can also be implemented using simple array, or binary tree.

Heap

A heap is a tree but arranged differently than the binary tree. It is normally stored in an array.

heap property - if A is parent of B it is ordered with respect to B with the same ordering across the heap

In the representing array the children of a node at position n are at positions 2n and 2n + 1.

Most important operations:

Heap sort (O(n*lon(n))) is actually a form of selection sort but with a better data structure.

Searching in a heap is very ineffective.

Binary Search Tree

For every element x the following holds: all elements of right subtree of x are >x and all elements of the left subtree are <=x. Watch out: not only immediate children matter, but all the descendants! immediate children can meet this criterion but the grandchildren not anymore.

Operations:

Note that insertions/deletions can create unbalanced trees which are no longer so optimal - that’s why data randomness is often desired. An example of always balanced trees are the red-black trees.

In a complete tree (all nodes have 2 children or are leaves) the number of nodes is 2^n -1.

AVL Tree

Balanced binary tree is such that its right and left subtrees’ heights differ by at most 1. AVL tree is an example of a balanced binary tree.

In AVL tree each node has assigned “balance” which is the height of the left subtree minus the height of the right subtree.

After each insert the tree is re-balanced: we rotate - left or right. If the child and grandchild in the unbalancing subtree are on the same side we need to rotate only once, if they are on opposite sides (left and right, or right and left) - we need to rotate twice, in opposite directions.

Graph

Graph consists of set of vertices and edges. They can be:

Storing the graph

Graph can be stored in 2 ways:

In adjacency list the order of elements in the list does not matter. To store a weighted graph in an adjacency list we simply add the information about the node’s weight in the vertex object, or keep a two element array instead.

Breadth-first traversal

We check the node, then we check each of its children, then for each child we check the grandchildren, and so on. Implemented using a queue.

Linear complexity O(n+m) on both directed and undirected graph (m - number of edges, n - number of vertices), if we care about the edges. If we don’t then it is O(n).

Applications:

Depth-first traversal

We check the node, then we check the first child, then the grandchild, and so on, watching out not to collide with an already visited node. Implemented using stack or recursively.

Complexity is same as for BFT.

Stack implementation is a bit tricky compared to the DFT queue, as the pushing order differs from popping order:

Applications:

It divides edges into tree and back edges (the one that point back to the tree) - don’t get it


Comments


Comments: