Algorithms

Posted by Monik, 19 February 2017.
Programming Algorithms

Complexity analysis and the most important algorithms.

Table of contents

Complexity analysis

Random Access Machine

RAM stands here for Random Access Machine, and it is a computation model that we use to talk about algorithm time complexity:

Algorithm properties

Stability of sorting: a stable sorting algorithm preserves the order of records with equal keys (source).

Numerical stability of algorithm: a numerically stable algorithm avoids magnifying small errors (source). Input :The algorithm must have input(atleast one) from the specified set.

Output : The algorithm must have atleast one output from the specified set of inputs.

Finiteness : The algorithm must terminate after finite number of steps.

Definiteness : All steps of algorithm must be precisely defined.

Effectiveness: It must be possible to perform each step of algorithm correctly and in finite amount of time.

Complexity

We have two types of it:

Complexity is a (simplified) function of problem size (n) into time or space. We use the following notations:

Name Symbol Meaning
Big Oh O(g(n)) Worst case (upper bound)
Theta Θ(g(n)) Average case
Omega Ω(g(n)) Best case (lower bound)

Examples:

We are usually interested in the pessimistic worst case, which is the “Big Oh notation”.

The complexities from worst to best:

  1. n!, which is all permutations of set of n elements
  2. 2n, e.g. enumerating all subsets of a set
  3. n3, e.g. some dynamic programming algorithms
  4. n2, e.g. insertion sort, selection sort
  5. n*lg(n) - called also superlinear, e.g. quick sort, merge sort
  6. n - linear
  7. lg(n) - e.g. binary search, everything where we divide into halves
  8. 1 - single operations

In a sorted list: take the middle element and compare to the searched one; either continue with the right or left half of the list (note do not copy the arrays, maintain start and end indices).

Each level of a binary tree has 2level branches. The height of the tree with n leaves is log2n, as log2n = h <=> 2h = n.

Time complexity: O(log(n))

Selection sort

Take first element and look for a “most smaller” one. Swap. Take second, and so on.

Time complexity: Θ(n2)

Insertion sort

Take each element and pull it to the beginning of the array until it’s in the right place (constant swapping).

Time complexity: O(n2)

Merge sort

Merge sort each half of the array, and then merge 2 parts together in order.

Time complexity: O(n*logn)

Quick sort

  1. Pick the last element (pivot)
  2. Put a marker just before the first element
  3. Go from start to end, and whenever there is an element < pivot, put it just in front of the marker and move the marker one element towards the end
  4. The element just after the marker is now exactly where the pivot should be in the final sorted array. Swap the pivot and the element after the marker
  5. Sort the part before and after the pivot/marker separately.

Time complexity: average is Θ(n*logn), worst is O(n2) (in case the pivot happens to be always the smallest/the largest number). We can avoid the worst complexity by pre-randominzing the data set first, or simply choosing the pivot at random each time.

Bucket sort

We put elements into k baskets of some ranges, and sort data in each basket (in another way), and then but the data back into the array.

Distributing the elements into the baskets is O(n) time. There are some catches though when it comes to the complexity of assembling the array back:

Radix sort

Similar to bucket sort, but we do it like that:

By repeating it until we process all digits we actually end up with a sorted array.

Instead we can start from most significant digit (use recursion), and that is how strings can be efficiently sorted.

The time complexity is O(w*n), where w is the max length of numbers/words.

Graph algorithms

Minimum spanning tree

This is done in a weighted graph. In an unweighted graph each tree is a tree with minimum number of edges already (n-1 edges).

Prim’s algorithm

It goes like BFT and on each depth it chooses the minimum cost edge. Even though it is greedy it actually does always find the minimum spanning tree. The complexity is O(n^2) but if we use e.g. heap instead of linked list we can reduce it to O(m*logn) where m is the number of all vertices.

Kruskal algorithm

Start from cheapest to most expensive edges and before adding each of them check if we are still having a tree here (we shall not be adding to the same connected component). Time complexity: sorting edges is O(m*logm), building the tree O(m*n) and with a better data structure for testing connected component it is O(m*logm), which is better for sparse graphs.

Shortest path in weighted graph

We look for a cheapest path between 2 vertices. In an unweighted graph we just do BFT starting from one of the edges.

Djikstra’s algorithm

Similar to Prim’s algorithm, but for each vertex we record tentative cost of travelling to it from the starting vertex. At the beginning the starting vertex has a cost of 0 and all other vertices have cost of infinity. When we find a lower cost for a vertex, we update it.

We also move in a BFT fashion, but always pick the vertex with the cheapest tentative cost first. We finish when we have traversed all the graph.

Usually uses heap to keep track of the minimum distance for each vertex.

Strings

String matching

Compare letter by letter and if a letter of a small text does not match skip to the next letter of the big text.

NP Completness

The problems to be solved by algorithms have been divided to a number of problem classes:

  1. P - problems that can be solved in polynomial time (or better)
  2. NP - binary problems to which solution can be determined in polynomial time
  3. NP-hard - problems to which any NP problem can be reduced in polynomial time (do not have to be NP problems themselves)
  4. NP-complete - problems that are both NP and NP hard

Some conclusions:

  1. P != NP

How to prove that a problem is an NP-complete problem do these 2 things:

  1. prove that the problem is NP - give a polynomial time verification algorithm
  2. find another NP-complete which reduces to your problem (so such that solution to that problem equals solution to your problem)

NP Hard problems

Knapsack problem

Solution is NP Hard.

Decision whether the solution is correct is NP Complete.

Here is the absolutely best video.

Travelling salesman problem

Solution is NP Complete.

Exponential time.

Algorithm:

  1. write the matrix of cost of travelling between cities (the diagonal will be “-“)
  2. row minimisation - for each row: substract the smallest element in the row from each element in the row
  3. column minimisation - for each column: substract the smallest element in the column from each element in the column
  4. penalties - for each zero, sum the minimum element in its row and minimum element in its column
  5. cross out the column and row of the zero with the biggest penalty (if mutliple are equal then it does not matter wchich one)
  6. the crossed out city pair is part of the solution (keep the direction same)
  7. remove the crossed out row and column and go to step 2.

Comments


Comments: