Princeton Alogorithm COS226 Week9 Radix Sort

Alphabets

image.png

review: summary of the performance of sorting algorithms

image.png

we can do better if we don't depend on key compares.
assume that keys are integers between 0 and R-1

image.png

LSD Radix sort

consider character from right to left, stably sort using d th character as the key (using key-indexed counting)

image.png

How to sort one million 32 bit integers?

if we use quick sort, we will cost 1.39 * log(1 million) * n = 28 N
if we use LSD, we will cost 2 * 10 * n = 20N
but the space cost quick sort wins; for stability, LSD sort wins.

MSD Radix sort

partition array into R pieces according to first character
recursively sort all strings that start with each character

image.png

treat strings as if they had an extra char at end (this char should smaller than any char)

image.png

improvement : cutoff to insertion sort

image.png

MSD vs quick sort

msd need extra space for aux[] and count[], inner loop has a lot of instructions, access memory randomly
linearithmic number of string compares
has to rescan many character in keys with long prefix matches

combine MSD and quick sort

do 3-way partitioning on the dth character
less overhead than R-way partitioning in MSD string sort
does not re-examine character equal to the partitioning char

image.png

suffix array

image.png

Manber-Myers algorithm implementation is added into extra credit

summary

we can develop linear-time sorts. key compares not necessary for string keys.

we can develop sublinear-time sorts, input size is amount of data in keys, not all of the data has to be examined
input size is Radix, not number of keys.

3-way string quicksort is asymptotically optimal.
・1.39 N lg N chars for random data.

QUIZ

Q1 2-sum

image.png

just sort with LSD, then use two pointer

Q2 American flag sort

image.png

counting sort

Q3 Cyclic rotations.

image.png

foreach string, we could build suffix string sorting array, and use the first string in array as the finger print, check the finger print have same, we could use hashset to achieve it.
if we use Ukkonen’s algorithm, we could achieve it in O(L), then the total time complexity is O(nL)
in my github, i use manber myer which time complexity is O(L * log L * n)

course quiz and project : https://github.com/yixuaz/algorithm4-princeton-cos226