Introducing KSQL

有什么用？

以前我们拿kafka当作一个data hub，用来传输数据，扔进数据库。KSQL使得我们可以直接拿kafka当作数据库，读，写，转变数据

From a generic point of view KSQL is what you should use when transformations, integrations and analytics need to happen on the fly during the data stream. KSQL provides a way of keeping Kafka as unique datahub: no need of taking out data, transforming and re-inserting in Kafka. Every transformation can be done Kafka using SQL! Kafka + KSQL turn the database inside out

特性

solve the main problem of providing a SQL interface over Kafka, without the need of using external languages like Python or Java
continuous queries: with KSQL transformations are done continuously as new data arrives in the Kafka topic

Cases

like real time analytics, security and anomaly detection, online data integration or general application development

怎么用？

关键词

streams and tables

A Stream is a sequence of structured data, once an event was introduced into a stream it is immutable.
A Table on the other hand represents the current situation based on the events coming from a stream.

A topic in Apache Kafka can be represented as either a STREAM or a TABLE in KSQL, depending on the intended semantics of the processing on the topic.

例子

For instance, if you want to read the data in a topic as a series of independent values, you would use CREATE STREAM. An example of such a stream is a topic that captures page view events where each page view event is unrelated and independent of another. If, on the other hand, you want to read the data in a topic as an evolving collection of updatable values, you’d use CREATE TABLE. An example of a topic that should be read as a TABLE in KSQL is one that captures user metadata where each event represents latest metadata for a particular user id, be it user’s name, address or preferences.

机制

KSQL enables the definition of streams and tables via a simple SQL dialect. Various streams and tables coming from different sources can be joined directly in KSQL enabling data combination and transformation on the fly.

Each stream or table created in KSQL will be stored in a separate topic, allowing the usage of the usual connectors or scripts to extract the informations from it.

实战

standalone and client-server mode

KSQL can work both in standalone and client-server mode with the first one aimed at development and testing scenarios while the second supporting production environments.

Syntax Reference

What’s Next for KSQL?

Now: releasing KSQL as a developer preview to start building the community around it and gathering feedback
Plan: add several more capabilities as we work with the open source community to turn it into a production-ready system
注：quality, stability, and operability of KSQL to supporting a richer SQL grammar including further aggregation functions and point-in-time SELECT on continuous tables–i.e., to enable quick lookups against what’s computed so far in addition to the current functionality of continuously computing results off of a stream.