kafka概述

image.png

Apache kafka 开源分布式事件流平台,用于数据管道、流分析、数据集成等。
kafka由servers和clients组成,通过TCP网络协议交流。

event (record or message)

Event key: "Alice"
Event value: "Made a payment of $200 to Bob"
Event timestamp: "Jun. 25, 2020 at 2:06 p.m."
optional metadata headers

Producer vs Consumer

Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events.

Topic

事件存储在topic中
Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events.
每个topic都支持多生产者&多消费者。

partition 分区

Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers.
topic是分区的,这意味着一个topic分布在位于不同Kafka代理上的多个“桶”上。

image.png

Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.
消费的顺序跟生产的顺序严格一致。

数据安全

A common production setting is a replication factor of 3, i.e., there will always be three copies of your data.
数据都是备份的,默认备3份。

APIs

admin API 管理
producer API 生产
consumer API 消费
streams API 事件流处理高级功能
connect API 负责导入导出数据

安装

  1. tar包安装,像ES
  2. docker镜像
  3. Zookeeper

python SDK kafka-python(kafka-python-ng)

查看官方文档https://kafka-python.readthedocs.io/en/master/index.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容