Apache kafka 开源分布式事件流平台,用于数据管道、流分析、数据集成等。
kafka由servers和clients组成,通过TCP网络协议交流。
event (record or message)
Event key: "Alice"
Event value: "Made a payment of $200 to Bob"
Event timestamp: "Jun. 25, 2020 at 2:06 p.m."
optional metadata headers
Producer vs Consumer
Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events.
Topic
事件存储在topic中
Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events.
每个topic都支持多生产者&多消费者。
partition 分区
Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers.
topic是分区的,这意味着一个topic分布在位于不同Kafka代理上的多个“桶”上。
Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.
消费的顺序跟生产的顺序严格一致。
数据安全
A common production setting is a replication factor of 3, i.e., there will always be three copies of your data.
数据都是备份的,默认备3份。
APIs
admin API 管理
producer API 生产
consumer API 消费
streams API 事件流处理高级功能
connect API 负责导入导出数据
安装
- tar包安装,像ES
- docker镜像
- Zookeeper
python SDK kafka-python(kafka-python-ng)
查看官方文档https://kafka-python.readthedocs.io/en/master/index.html