top-10

top 10的算法:我们只需要维护一个10个大小的数组,初始化放入10Query,按照每个Query的统计次数由大到小排序,然后遍历这300万条记录,每读一条记录list后进行从大到小排序。如果list长度为11,则pop()默认删除最后一个元素。
不难分析出,这样的算法的时间复杂度是N*K, 其中K是指top多少。

#!/usr/bin/python
"""
Your mapper function should print out 10 lines containing longest posts, sorted in
ascending order from shortest to longest.
Please do not use global variables and do not change the "main" function.
"""
import sys
import csv


def mapper():
    reader = csv.reader(sys.stdin, delimiter='\t')
    writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
    
    a= []
    for line in reader:
        a.append(line)# YOUR CODE HERE
        a.sort(key=lambda x: len(x[4]),reverse=True)
        if len(a) == 11:
            a.pop()
    for line in reversed(a[0:10]):
        writer.writerow(line)


test_text = """\"\"\t\"\"\t\"\"\t\"\"\t\"333\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"88888888\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"1\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"11111111111\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"1000000000\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"22\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"4444\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"666666\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"55555\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"999999999\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"7777777\"\t\"\"
"""

# This function allows you to test the mapper with the provided test string
def main():
    import StringIO
    sys.stdin = StringIO.StringIO(test_text)
    mapper()
    sys.stdin = sys.__stdin__

main()

这里top10代码的核心是:

def mapper():
    reader = csv.reader(sys.stdin, delimiter='\t')
    writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
    
    a = []
    for line in reader:
        a.append(line)# YOUR CODE HERE
        a.sort(key=lambda x: len(x[4]),reverse=True)
        if len(a) == 11:
            a.pop()
    for line in reversed(a[0:10]):
        writer.writerow(line)

python中的lambda表达式可以减少代码量。参数为列表x,返回x[4]的长度作为排序的key,reverse=True表示降序排列。 a.sort(key=lambda x: len(x[4]),reverse=True)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容