问题:We aim to find the i-th smallest element from a set A containing n elements, 1 <= i <= n.
思路: The general strategy is to find a pivot element x in A such that a constant fractional number of elements in A can be discarded for further consideration.
The key to the proposed algorithm is how to find such a pivot element x within each iteration efficiently!
Linear-Time Selection
Step 1. Divide the n elements into n/5 groups of 5 elements each (each group contains exactly 5 elements except the last group which may contain less than 5 elements).
Step 2. Find the median of each group by any sorting method.
(If the number of elements in the last group is even, take either one of the two medians).
Consequently,a median sequence consisting of group medians is formed, which contains n/5 elements exactly with each group having its median there.
Step 3. Find the median x from the median sequence of n/5 “group median” elements, using the linear selection algorithm recursively, where element x is the pivot element that will be used to partition the set A into three disjoint subsets R1, R2 and R3.
Step 4. Partition the n-element set A into three disjoint subsets R1, R2 and R3, using the found x (as the pivot element), i.e., the set A is partitioned into three disjoints subsets R1, R2, and R3 . R2 = { x}
Analysis of the Linear Selection Algorithm
Step 1. Partition the n elements into n/5 groups, it take linear time to scan the n elements, i.e., O(n) time
Step 2. Perform sorting within each group, thus it takes constant time to sort the 5-element in each group. In total, this step takes n/5 * O(1) = O(n) time, where each group contains no more than 5 elements, its sorting time is O(1).
Step 3. Find the median x of the median sequence of n/5 elements, which takes T (n/5) time. Notice that the running time of the selection algorithm is applicable for any parameter i with 1 <= i <= |A|.
Step 4. Use the value of x to partition the set A into three disjoint subsets R1, R2 and R3, it takes O(n) time
Step 5. If |R1| < i <= |R1|+|R2|, done.
Otherwise, call the selection algorithm either on R1 or R3, not both of them, it takes max{T (|R1|), T (|R3|)}. Fortunately we can show that both |R1| and |R3| are less than 7n/10 + 6, i.e., a fraction length of the original length n. Thus, this step takes T (7n/10 + 6) time.
证明:这里有n个元素,有n/5个组。通过一系列比较,可以得到median x 至少要大于3((n/5)/2 - 2)个数。减2是因为要减去它自己本身和最后一个少于五个元素的数组。
所以the running time of the linear selection
algorithm: T(n) < T(n/5) + T(n - 3*((n/5) / 2 - 2)) + O(n) (n > 140)
or T(n) = O(1) ( n <= 140)
To prove the time complexity of the linear selection
algorithm is O(n), I used the Substitution method