Flink是什么
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale
上面这段话来自于Apache官网,可以简单翻译为Flink是一个有状态的,处理有界和无界数据流的分布式计算框架,可以运行在常规集群之上,计算速度是以内存的速度计算的而且规模是可以扩大的
Unbounded vs Bounded Data
Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time. Processing unbounded data often requires that events are ingested in a specific order, such as the order in which events occurred, to be able to reason about result completeness.
无界的数据流有开始没有结束,在数据产生的阶段并不会中断,无界的流必须能够被持续的处理,数据出现就要有事件的处理,不可能等到数据都到了再进行计算因为数据是无穷无尽的。事件处理的时候是按照数据流的顺序计算的比如数据开始的时候
Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations. Ordered ingestion is not required to process bounded streams because a bounded data set can always be sorted. Processing of bounded streams is also known as batch processing.
有界的数据流是有开始和结束的,有界的流可以等所有数据都准备好再开始计算,然后有界的流也可以认为是一种批处理
Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance
Flink支持有界数据流和无界数据流进行处理,可以准确的控制数据的时间和状态,Flink可以处理各种各样无界流的应用程序,有界的流也可以通过数据结构和算法进行计算,所以就可以通过固定大小的数据及进行处理从而获得一个好的性能
Layered APIs
分层API,常用的第一层和第二层,第三层需要自己手工去实现代码比较复杂。
业界流处理框架对比
Spark:Streaming 结构化流 批处理为主 流式处理是批处理的一个特例(mini batch)
Flink:流式处理为主,批处理是流式处理的一个特例
Storm:流式 Tuple
总结可以得出,Flink是基于事件驱动的标准的实时流处理框架,而Spark是微批(Micro-Batch)模型