Map Reduce笔记

MapReduce: Simplified Data Processing on Large Clusters acm pdf

What is Map Reduce?

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

..in particular, it “runs on a large cluster of** commodity machines** and is highly scalable.”

Who built it?

Google, for their search indexing.

Why was Map Reduce successful?

Easy to use, expressive (to an extent), scaleable implementation.

  • it hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.
  • MapReduce is used for the generation of data for Google’s production web search service, for sorting, for data mining, for machine learning.
  • scales to large clusters of machines comprising thousands of machines.

What were the takeaways from Map Reduce?

  • Restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tolerant.
  • Network bandwidth is a scarce resource. Optimizations targeted at reducing the amount of data sent across the network: the locality optimization allows us to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth. Cheaper to send code to data than sending data to code.
  • Redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.

What is map reduce good for?

What is it not good for?

  • Iterative algorithms
  • Interactive queries
    Both have incredibly slow perf.

What influences did MR have on later systems and usage today?

  • It influenced DryadLINQ, which in turn inspired Spark.
  • Basically pioneered the idea of doing large scale computations over distributed commodity clusters.
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 这不是美国电视剧,而是发生在甘肃农村的真实事情,甘肃农妇杨改兰,杀了自己的四个孩子后自杀。都说虎毒不食子,究竟是什...
    葵宝贝520阅读 1,328评论 1 1
  • 有一天你消失了 我当然会去找你 然后想起我们的过往 我就只有等你而已 我还会认识新人 你也有新的相信 我还会面临磕...
    一首诗和小H阅读 1,166评论 0 1
  • 我是个混日子的魔术师,这并不是什么谦虚的说法。跟着这个剧团已经七年了,祖国的大好河山基本也跑遍了。当然,我们去的地...
    红酥手贱阅读 3,237评论 0 1
  • 21我们的睡眠不能超过8个小时,也尽量不要低于7个半小时,多了或少了对健康都不利。 22运用软件记录睡眠,不要凭感...
    锐_95ec阅读 1,288评论 0 0

友情链接更多精彩内容