Sorting
From DmWiki
This article is a stub. You can help improve the article by expanding it.
In the field of computer science sorting a set of data, together with searching a piece of data, belongs to one of the most fundamental algorithmic operations. As with any algorithm sorting can be classified by it's asymptotical complexity. Historically sorting a set of data is considered to be a problem that can be solved in quadratic complexity. A few algorithms exist that solve the problem in this class of complexity. Insertion sort and bubble sort being two of them.
| Table of contents |
Bubble Sort
Bubble sort can be seen as one of the most straightforward sorting techniques. Given an array that we want to sort the algorithm picks the first element in the array and compares it to all the other elements following it. In case the first element is bigger than the element it is currently compared to, the two elements are swapped. This way the element rises up in the array like a bubble, if it is bigger than the others. Hence the name bubble sort.
The process is repeated for all elements in the array. In the worst case we can imagine the array would have to be traversed completely for each element in it. This gives a complexity of O(n2).
In peseudo code :
foreach(element e in array){
foreach(element x following e){
if(e > x) swap(e, x);
}
}
Quick Sort
One of two popular algorithms that try to reduce the complexity of sorting is the quick sort algorithm.
Quick sort is one of the clasic divide and conquer algorithms. It has two phases in which it first an element in the array, that is typically referred to as the pivot element. A common choice for the pivot element is the median of the first and last element. In the second phase the algorithm compares each element in the array to the selected pivot element. In this fashion bigger elements are sorted to the right of the pivot element and smaller ones to the left. Once this operation is completed the algorithm is recursively called again with the two partitions of the array, which where created in the sorting step.
...pseudo code...
As a matter of fact quick sort doesn't reduce the worst case complexity than bubble sort, it is still O(n2). Cases can be constructed in which quick sort performs with quadratic complexity. The average complexity of quick sort though, is
, which is significantly better. The worst case surprisingly occurs when all the elements are already sorted.
... piece about quick sort worst case...
Fortunatly the average case is much more common, which is why the quick sort algorithm persists in the world of computer science since it's invention.
Merge Sort
One shortcomming of quick sort is that it's
complexity is not stable. In cases where
a logarithmic complexity is required merge sort is usually employed.
... piece about merge sort...
The drawback inherent in merge sort is that, while it has a guaranteed complexity of
, it's memory usage grows immensly with bigger data sets, in contrast to quick sort, that doesn't need to allocate additional memory for sorting. Also in pratice quick sort is usually the algorithm of choice, since it's average complexity is smaller than the fixed complexity of merge sort.
Sorting in Linear Time
Looking at the sorting problem one might conclude that sorting a set of data in linear time is just impossible. While highly inefficent, an algorithm exists that perfectly sorts a set of data with linear complexity. Given in functional pseudo syntax :
For a sorted set A with k symbols and a word
For those not familiar with funtional programming syntax :
What this does is define a function that takes a word (array of symbols) and calls one of k other functions called bucket0 through bucketk - 1. Each of these functions then recursively filters out the i'th symbol of A (If the symbol x is exactly the i'th symbol of A, we take x and concatenate it with another call to bucketi, passing the remaining word; otherwise we just return the next call to bucketi). Called in order the functions obviously produce a sorted version of w, as A is a sorted set. Since each bucket function takes linear time to run (it works through the entire word once) and is only called a constant number of k times we can assume that it's overall complexity is linear.
Earlier it was mentioned that the algorithm is rather inefficent. The reason here is that the algorithm only performs well if the constant number of symbols in A, from which we draw the words we would like to sort, is much smaller than the length of the words. In real life scenarios this is rarely the case however. Imagine sorting numbers. The sorted set A here is the entire integer span of 32 bits. To produce a general sorting algorithm you would have to iterate 2^32 times over your array of numbers, which might in fact only be a 100 numbers long to begin with.
