What's the runtime of a good sorting algorithm?
The choice of the best sorting algorithm depends on the context.
Large inputs
For large inputs, We first consider Mergesort, Heapsort. Mergesort and Heapsort can do O(nlog(n)) for Best, Average and Worst cases.
Mergesort(n) = 2Mergesort(n/2) + Merge
Mergesort does need a lot of auxiliary memory, which is one reason why you might not want to use it if you have a huge amount of data to sort.HeapSort (ascending): Build a max heap in place--> swap the root with the last item of the heap --> heapify the root
Heapsort is a good algorithm in practice, but isn't as fast as the other algorithms in some cases because it doesn't have good locality of reference. That said, the fact that it never degenerates and needs only O(1) auxiliary space is a huge selling point.
Note we can use array to represent the binary heap: If the parent node is stored at index I, the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2 (assuming the indexing starts at 0).
- Quicksort
Quicksort can also be a choice as it on average is O(nlogn), but the worst case of Quicksort which has a running time of O(n^2). It occurs very rarely and can be avoided by a good pivot choosing algorithm but it is not guaranteed that worst case won't occur.
You can choose first/last/random element as pivot.
How to implement partition() by pivot?
You maintain two indices i and j. Use j to scan the array, once you find an element which is smaller than i, do i++ and swap i and j. So i is pointing to the last element of Less_Than subarray, do i++ to increase the less_Than subarray by one.
The introsort algorithm keeps track of the recursion depth and switches the algorithm to heapsort if it looks like the quicksort will degenerate. This guarantees O(n log n) worst-case behavior with low memory overhead and maximizes the amount of benefit you get from quicksort. Randomized quicksort, while still having an O(n2) worst case, has a vanishingly small probability of actually hitting that worst case.
Short Inputs
For short inputs, Insertion sort runs best with best case running time O(n) and worst case running time O(n^2).
- Insertion Sort works the way we sort playing cards in our hands.
Best Ω(n), Average θ(n^2), Worst O(n^2).
Input distribution
If it's mostly sorted, then something like Timsort might be a great option, since it's designed to work well on sorted data.
Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data.
Best: O(n)
Average&Worst: O(nlogn)
Element Type
If you are sorting generic objects, then you're pretty much locked into comparison sorting. If not, perhaps you could use a non-comparison sort like counting sort or radix sort.
The lower bound for Comparison based sorting algorithm (Merge Sort, Heap Sort, Quick-Sort) is Ω(nLogn).
- Counting sort
It is a linear time sorting algorithm that sort in O(n+k) time when elements are in range from 1 to k.
What if the elements are in range from 1 to n^2?
Counting sort will take O(n^2). Radix Sort is the answer.
The idea of Radix Sort is to do digit by digit sort starting from least significant digit to most significant digit. Radix sort uses counting sort as a subroutine to sort. (counting sort takes extra space to sort numbers)
Let there be d digits in input integers. Radix Sort takes O(d*(n+b)) time where b is the base for representing numbers, for example, for decimal system, b is 10. What is the value of d? If k is the maximum possible value, then d would be O(logb(k)). So overall time complexity is O((n+b) * logb(k)). Which looks more than the time complexity of comparison based sorting algorithms for a large k. Let us first limit k. Let k <= nc where c is a constant. In that case, the complexity becomes O(nLogb(n)). But it still doesn’t beat comparison based sorting algorithms.
What if we make value of b larger?. What should be the value of b to make the time complexity linear? If we set b as n, we get the time complexity as O(n). In other words, we can sort an array of integers with range from 1 to nc if the numbers are represented in base n (or every digit takes log2(n) bits).
Data Structure
how are your data represented? If they're stored in an array, quicksort or a quicksort variant will likely do well because of locality of reference, while mergesort might be slow due to the extra memory needed. If they're in a linked list, though, the locality of reference from quicksort goes away and mergesort suddenly becomes competitive again.
Reference:
https://www.geeksforgeeks.org/time-complexities-of-all-sorting-algorithms/