查找与排序总结

查找

线性查找

def seq_search(target,alist):
   found = False
   pos = 0
   count = 0
   while not found and pos < len(alist):
       count = count + 1
       if alist[pos] == target:
           found = True
       else:
           pos = pos + 1
   return found,count

时间复杂度
情况1: item is present:

best : O(1)
worst : O(n)
avg : O(n/2)

情况2 : item is not present:

best : O(n)
worst : O(n)
avg : O(n)

优化：当要查找的列表是有序列表时，可以加入一个提前结束判断来优化时间复杂度。

def seq_search_ordered(target,alist):
    found = False
    pos = 0
    stop = False
    count = 0
    while not found and pos < len(alist) and not stop:
        count = count + 1
        if alist[pos] == target:
            found = True
        
        if alist[pos] > target:
            stop = True #如果当前元素大于目标元素，则停止查找
        
        pos = pos + 1
    return found,count

二分查找

def binary_search(target,alist):
    low = 0
    high = len(alist) - 1
    found = False
    while not found and low <= high:
        mid = (high+low)//2
        if alist[mid] == target:
            found = True
        else:
            if alist[mid] > target:
                high = mid - 1 
                
            else:
                low = mid + 1
                
                
    return found

时间复杂度：

worst ： O(n) 逆序的
best : O(1) 本身排好序的
avg : O(logn)

哈希查找

键(key)：要存储在槽中的数据或数据的一部分。
槽(slot)：哈希表中用于保存数据的一个单元，也就是数据真正存放的容器。
哈希函数(hash function)：将键(key)映射(map)到数据应该存放的槽(slot)所在位置的函数。
base: remainder
扩展：
1. 折叠法
  将数据均分成相等的长度（最后一个部分可能不等），然后相加取余。
  For example, if our item was the phone number 436-555- 4601, we would take the digits and divide them into groups of 2 (43, 65, 55, 46, 01). After the addition, 43 + 65 + 55 + 46 + 01, we get 210.
  假设slot总数为11，则hash_num = 210 % 11 = 10
2. 平方取中法
  数据取平方然后取中间的数。
  44^2 = 1936 hash_num = 93 % 11 = 5
3. 对字符串来说
  ‘cat’ 99 + 97 + 116 = 312 hash_num = 312%11 = 4
哈希冲突(hash collision)：哈希函数将两个不同的键映射到同一个索引的情况。
解决方案：
1. 开放地址法。open addressing
  如果slot不为空，则继续寻找直到找到下一个空槽。
- 线性探查（linear probing）
  线性探查，如果T(n)不为空，则T(n+1)...直到T(n-1)
  问题：产生聚集
  再哈希法
  当产生冲突时，再hash一遍来跳过n个槽
- 平方探测（线性探测的变种）
  每次跳过1，3，5，7，..个slot而不是常数个。
  如果第一个哈希值时h，则h+1,h+4,h+9,h+16
1. 链地址法
  这种方法的基本思想是将所有哈希地址为i的元素构成一个称为同义词链的单链表，并将单链表的头指针存在哈希表的第i个单元中，因而查找、插入和删除主要在同义词链中进行。链地址法适用于经常进行插入和删除的情况。

分块/索引查找

块间有序，块内无序
先使用二分查找搜索索引表，找到具体块，然后在块内用线性查找。
索引表包含两个元素：该快的起始元素的位置；该块中的最大值（或最小值）
ASL 平均搜索长度：索引表二分查找平均搜索长度+ 块内线性查找平均搜索长度
假设共有s个元素，分为m块，每块中有n个元素， (1+2+...+m)/m + (1+2+...+n)/n = (m+n)/2 + 1

搜索总结

由于binary search要求列表有序，所以对于排序一次可搜索多次的小列表来说较好
对于大列表来说，线性搜索而不是先排序再使用二分搜索可能更高效。

排序

冒泡排序

def bubble_sort(alist):
    for i in range(len(alist)-1):
        exchange = False # exchange flag
        for j in range(len(alist)-i-1):
            if alist[j] > alist[j+1]:
                alist[j],alist[j+1] = alist[j+1],alist[j]
                exchange = True
        if not exchange : #优化算法，如果一次遍历中没有交换，则已排好序
            return alist
    return alist

时间复杂度：

worst ： O(n^2) 如列表本身为逆序时
best : O(n) 如列表本身排好序的

快速排序

def quick_sort(alist):
    if len(alist) == 0:
        return alist
    pivot = alist[0]
    left = [each for each in alist if each < pivot]
    right = [each for each in alist if each > pivot]
    return quick_sort(left) + [pivot] + quick_sort(right)

时间复杂度：

worst : O(n^2) 如pivot无法很好的分割，或者列表本身逆序
avg : O(nlogn)

扩展
上述方法可以直接用于数组去重
若数组可重复，则将上述left\right任一判断方式改为<=或>=即可。

选择排序
每次选取当前最大的数并放在合适的位置。

def selection_sort(alist):
    for pos in range(len(alist)-1, 0, -1):
        max_pos = 0
        for i in range(0,pos+1):
            if alist[max_pos] < alist[i]:
                max_pos = i
        alist[max_pos], alist[pos] = alist[pos],alist[max_pos]
    return alist

T(n) = n + (n-1) + (n-2) + ... + 1 = (n^2 + n)/2
时间复杂度为O(n^2). 但是由于更少的交换次数，还是比冒泡排序法要优化。

查找与排序总结

查找

搜索总结

排序

推荐阅读更多精彩内容