flink状态容错

什么是State(状态)？

某task/operator在某时刻的一个中间结果
快照(shapshot)
在flink中状态可以理解为一种数据结构
举例
对输入源为<key,value>的数据,计算其中某key的最大值，如果使用HashMap，也可以进行计算，但是每次都需要重新遍历，使用状态的话，可以获取最近的一次计算结果，减少了系统的计算次数
程序一旦crash，恢复
程序扩容

State类型

Operator State(算子状态)

With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance. The Kafka Connector is a good motivating example for the use of Operator State in Flink. Each parallel instance of the Kafka consumer maintains a map of topic partitions and offsets as its Operator State.
The Operator State interfaces support redistributing state among parallel operator instances when the parallelism is changed. There can be different schemes for doing this redistribution.

kafka示例

flink官方文档用kafka的消费者举例，认为kafka消费者的partitionId和offset类似flink的operator state

提供的数据结构：ListState<T>
每一个Operator都存在自己的状态

key State

Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream.
You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.

基于KeyStream之上的状态
可理解为dataStream.keyBy()之后的Operator State,Operator State是对每一个Operator的状态进行记录，而key State则是在dataSteam进行keyBy()后，记录相同keyId的keyStream上的状态
key State提供的数据类型：ValueState<T>、ListState<T>、ReducingState<T>、MapState<T>

状态容错

Introduction
Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. The mechanism ensures that even in the presence of failures, the program’s state will eventually reflect every record from the data stream exactly once. Note that there is a switch to downgrade the guarantees to at least once (described below).
The fault tolerance mechanism continuously draws snapshots of the distributed streaming data flow. For streaming applications with small state, these snapshots are very light-weight and can be drawn frequently without impacting the performance much. The state of the streaming applications is stored at a configurable place (such as the master node, or HDFS).
In case of a program failure (due to machine-, network-, or software failure), Flink stops the distributed streaming dataflow. The system then restarts the operators and resets them to the latest successful checkpoint. The input streams are reset to the point of the state snapshot. Any records that are processed as part of the restarted parallel dataflow are guaranteed to not have been part of the checkpointed state before.
Note: For this mechanism to realize its full guarantees, the data stream source (such as message queue or broker) needs to be able to rewind the stream to a defined recent point. Apache Kafka has this ability and Flink’s connector to Kafka exploits this ability.
Note: Because Flink’s checkpoints are realized through distributed snapshots, we use the words snapshot and checkpointinterchangeably.

依靠checkPoint

checkPoint概念：进行全局快照，持久化保存所有的task/operator的State

特点：
异步：轻量级，不影响系统处理数据
Barrier机制
失败情况下可回滚致最近一次成功的checkpoint
周期性
保证exactly-once

chcekPoint

Restore

shapshot(快照)

Barriers(屏障)
Barriers是flink分布式快照中的重要元素

单并行度Barriers

多并行度Barriers

Barrier被注入数据流中，并随着数据流和记录一起流动，每一个Barrier携带者快照ID，并且十分轻量级，不会打断数据的流动，不同时期的快照的barrier可以同时存在数据流中，所以各种快照可以同时发生。
相对于单并行度，多并行度的快照需要不同数据流中携带相同快照ID的Barrier经过operator之后，才能进行checkpoint。

image.png

个人理解：感觉对于Flink的状态迁移和容错来说，主要依赖checkpoint机制，而其中最重要的元素就是Barrier，通过Barrier保证流入Operator的数据都进行了checkpoint

flink状态容错