Kafka partition reassign 逻辑梳理

生成Assignment

bin/kafka-reassign-partitions.sh --zookeeper zkaddress地址 --topics-to-move-json-file topic.json --broker-list "1,2,3" --generate

topic.json 如

{
    "topics": [{
        "topic": "主题名称"
    }],
    "version": 1
}

输出如

Current partition replica assignment
{"version":1,"partitions":[{"topic":"test-topic","partition":0,"replicas":[1417718,1417973]},{"topic":"test-topic","partition":3,"replicas":[1417718,1417974]},{"topic":"test-topic","partition":2,"replicas":[1417974,1417718]},{"topic":"test-topic","partition":5,"replicas":[1417974,1417973]},{"topic":"test-topic","partition":1,"replicas":[1417973,1417974]},{"topic":"test-topic","partition":4,"replicas":[1417973,1417718]}]}

Proposed partition reassignment configuration
{"version":1,"partitions":[{"topic":"test-topic","partition":0,"replicas":[1417973,1417718]},{"topic":"test-topic","partition":3,"replicas":[1417973,1417974]},{"topic":"test-topic","partition":2,"replicas":[1417974,1417973]},{"topic":"test-topic","partition":5,"replicas":[1417974,1417718]},{"topic":"test-topic","partition":1,"replicas":[1417718,1417974]},{"topic":"test-topic","partition":4,"replicas":[1417718,1417973]}]}

kafka.admin.AdminUtils#assignReplicasToBrokers

/**
   * There are 2 goals of replica assignment:
   * 1. Spread the replicas evenly among brokers.
   * 2. For partitions assigned to a particular broker, their other replicas are spread over the other brokers.
   *
   * To achieve this goal, we:
   * 1. Assign the first replica of each partition by round-robin, starting from a random position in the broker list.
   * 2. Assign the remaining replicas of each partition with an increasing shift.
   *
   * Here is an example of assigning
   * broker-0  broker-1  broker-2  broker-3  broker-4
   * p0        p1        p2        p3        p4       (1st replica)
   * p5        p6        p7        p8        p9       (1st replica)
   * p4        p0        p1        p2        p3       (2nd replica)
   * p8        p9        p5        p6        p7       (2nd replica)
   * p3        p4        p0        p1        p2       (3nd replica)
   * p7        p8        p9        p5        p6       (3nd replica)
   */

执行 reassign

/bin/kafka-reassign-partitions.sh --zookeeper ${zookeeper_path} --reassignment-json-file ${reassignment_node_rar_partition_json} --execute

kafka.controller.KafkaController#onPartitionReassignment

/**
   * This callback is invoked by the reassigned partitions listener. When an admin command initiates a partition
   * reassignment, it creates the /admin/reassign_partitions path that triggers the zookeeper listener.
   * Reassigning replicas for a partition goes through a few steps listed in the code.
   * RAR = Reassigned replicas
   * OAR = Original list of replicas for partition
   * AR = current assigned replicas
   *
   * 1. Update AR in ZK with OAR + RAR.
   * 2. Send LeaderAndIsr request to every replica in OAR + RAR (with AR as OAR + RAR). We do this by forcing an update
   *    of the leader epoch in zookeeper.
   * 3. Start new replicas RAR - OAR by moving replicas in RAR - OAR to NewReplica state.
   * 4. Wait until all replicas in RAR are in sync with the leader.
   * 5  Move all replicas in RAR to OnlineReplica state.
   * 6. Set AR to RAR in memory.
   * 7. If the leader is not in RAR, elect a new leader from RAR. If new leader needs to be elected from RAR, a LeaderAndIsr
   *    will be sent. If not, then leader epoch will be incremented in zookeeper and a LeaderAndIsr request will be sent.
   *    In any case, the LeaderAndIsr request will have AR = RAR. This will prevent the leader from adding any replica in
   *    RAR - OAR back in the isr.
   * 8. Move all replicas in OAR - RAR to OfflineReplica state. As part of OfflineReplica state change, we shrink the
   *    isr to remove OAR - RAR in zookeeper and sent a LeaderAndIsr ONLY to the Leader to notify it of the shrunk isr.
   *    After that, we send a StopReplica (delete = false) to the replicas in OAR - RAR.
   * 9. Move all replicas in OAR - RAR to NonExistentReplica state. This will send a StopReplica (delete = false) to
   *    the replicas in OAR - RAR to physically delete the replicas on disk.
   * 10. Update AR in ZK with RAR.
   * 11. Update the /admin/reassign_partitions path in ZK to remove this partition.
   * 12. After electing leader, the replicas and isr information changes. So resend the update metadata request to every broker.
   *
   * For example, if OAR = {1, 2, 3} and RAR = {4,5,6}, the values in the assigned replica (AR) and leader/isr path in ZK
   * may go through the following transition.
   * AR                 leader/isr
   * {1,2,3}            1/{1,2,3}           (initial state)
   * {1,2,3,4,5,6}      1/{1,2,3}           (step 2)
   * {1,2,3,4,5,6}      1/{1,2,3,4,5,6}     (step 4)
   * {1,2,3,4,5,6}      4/{1,2,3,4,5,6}     (step 7)
   * {1,2,3,4,5,6}      4/{4,5,6}           (step 8)
   * {4,5,6}            4/{4,5,6}           (step 10)
   *
   * Note that we have to update AR in ZK with RAR last since it's the only place where we store OAR persistently.
   * This way, if the controller crashes before that step, we can still recover.
   */
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 节日快乐~ 今天是属于广大程序员的节日,祝自己快乐hhhhhh 随着业务量的急速膨胀和又一年双11的到来,我们会对...
    LittleMagic阅读 8,128评论 4 11
  • 一、为什么需要消息系统 1.解耦: 允许你独立的扩展或修改两边的处理过程,只要确保它们遵守同样的接口约束。 2.冗...
    join_a922阅读 3,184评论 0 0
  • Kafka的架构 包括Kafka的基本组成,Kafka的拓扑结构以及Kafka的内部通信协议。Kafka内部的通信...
    陈晨_软件五千言阅读 5,920评论 0 9
  • 以下文章来源于DBAplus社群 ,作者小火牛 DBAplus社群 围绕Database、Bigdata、AiOp...
    Hello_Muay阅读 4,617评论 0 1
  • 久违的晴天,家长会。 家长大会开好到教室时,离放学已经没多少时间了。班主任说已经安排了三个家长分享经验。 放学铃声...
    飘雪儿5阅读 12,187评论 16 22