Question
Using map reduce to count word frequency.
Example
chunk1: "Google Bye GoodBye Hadoop code"
chunk2: "lintcode code Bye"
Get MapReduce result:
Bye: 2
GoodBye: 1
Google: 1
Hadoop: 1
code: 2
lintcode: 1
Solution
MapReduce的map和reduce基本操作。
class WordCount:
# @param {str} line a text, for example "Bye Bye see you next"
def mapper(self, _, line):
# Write your code here
# Please use 'yield key, value'
# 这个实际上就是单纯的统计词频,但是使用yield,结果会被buffer收集起来的
for word in line.split():
yield word, 1
# @param key is from mapper
# @param values is a set of value with the same key
def reducer(self, key, values):
# Write your code here
# Please use 'yield key, value'
# values 是一组数字,代表key在不同的mapper或者chunck里面出现的次数
yield key, sum(values)