What is Kafka
Kafka is officially defined as a high performance open source distributed publish/subscribe message queue
Based on the official definition, Kafka has the following characteristics
- A high performance
- Open source
- distributed
- Based on publish/subscribe
But with Kafka Streams, Kafka can not only distribute and store messages, but also ingest and process them! So Kafka is officially defined as a streaming computing platform
Although Kafka supports Streaming computing, most people still use Kafka as a message queue for the most part. When Streaming computing is required, there are many middleware options such as Fink, Storm, Spark Streaming, etc. Kafka Streams may not be chosen
Kafka, brief
-
Kafka was originally developed by LinkedIn
-
In 2011 LinkedIn donated Kafka to the Apache Foundation and released the first open source version 0.7.0, which supports data compression and data copying across clusters
-
On October 23, 2012 Kafka was successfully incubated with Apache Incubator and officially became an Apache Top-level project with the simultaneous release of version 0.8.0
Kafka programming language
Kafka is written in Scala and Java, runs on the JVM and is compatible with existing Java programs, so deploying Kakfa requires deploying the JDK environment first
Important versions and features of Kafka
See Kafka version evolution
Core Kafka concepts
Kafka is a publish/subscribe based message queue, so the following concepts are inevitable
- Topic: Message Topic, Queue in standard MQ
- Producer: indicates message producers
- -Penny: Consumer
- Consumer Group: a Group of consumers
Kafka is a C/S architecture. You can also think of Producer and Consumer as clients
Kafka is also a distributed system, so there are multiple instances in a cluster, and each instance is called a Broker by Kafka
In addition, Kafka stores messages in a Topic in a distributed manner. Kafka divides messages into “segments” that are stored in different brokers. Each of these segments is called a Partition by Kafka
It is not safe to store only one Partition. For data security and high availability, Kafka can synchronize multiple partitions to different brokers. Kafka calls each Partition Replication
Partition is a logical concept in Kafka. It is logically understood that a Partition is stored in a Broker, but physically data must be stored on disk. Kafka splits a Partition into multiple segments and stores them in a file system. Each Segment consists of two files: xxx.index and xxx.log. Kafka calls such a Segment a Segment
Finally, Kafka records the progress of the production and consumption messages. Kafka calls this Offset, which you can think of simply as an auto-increment primary key in a relational database
Summarize the concepts mentioned above
The concept of Kafka architecture
- Broker: An instance of Kafka
The concept of Topic messages
- Partition: indicates a Topic message
- Replication: logically a copy of a Partition, essentially the same as a Partition, and a Topic message
- Offset: progress of production and consumption messages
Data storage concepts
- Segment: a Segment of data in the file system to which the Partition resides
Of course, there are more concepts in Kafka, such as Leader Replication, Follower Replication, etc. These concepts are discussed in detail in the corresponding chapters
KafKa close-up view
Broker
Reference KafKa Broker
Partition and Replication
For details, see KafKa Partition and Replication
Segment
Reference kafka segment
Producer
Reference KafKa Producer
Consumer
Reference KafKa Consumer