前言
在Flink中比如某些算子(join,coGroup,keyBy,groupBy)要求在数据元上定义key。另外有些算子操作,例如reduce,groupReduce,Aggregate,Windows需要数据在处理之前根据key进行分组。
If you want to use keyed state, you first need to specify a key on a DataStream that should be used to partition the state (and also the records in the stream themselves). You can specify a key using keyBy(KeySelector) on a DataStream. This will yield a KeyedDataStream, which then allows operations that use keyed state.
A key selector function takes a single record as input and returns the key for that record. The key can be of any type and must be derived from deterministic computations.
The data model of Flink is not based on key-value pairs. Therefore, you do not need to physically pack the data set types into keys and values. Keys are “virtual”: they are defined as functions over the actual data to guide the grouping operator.
创建环境:
一、根据上一个算子输出的顺序获取
java中第一个是0,第二个是1,以此类推,scala第一个是_1,第二个是_2,以此类推
二、根据上一个算子输出的顺序的默认field值获取
java中第一个是f0,第二个是f1,以此类推
三、根据上一次算子输出的pojo类来获取对应的属性
如果pojo类有嵌套,则用“.”嵌套获取。注意必须是pojo类
四、根据上一个算子输出的2个及以上值来做组合键
五、通过创建KeySelector类,继承getKey方法来创建key
六、测试
在cmd中启动nc:
控制台输出:
七、源码
Flink源码:https://github.com/apache/flink
Flink官网:https://ci.apache.org/projects/flink/flink-docs-release-1.12/