1, an overview of the
Kafka is a distributed, publish-subscribe based messaging system that addresses application decoupling, asynchronous messaging, and traffic peak clipping.
2. Publish and subscribe model
A message producer publishes a message to a Topic, multiple message consumers subscribe to the message, and the message is not cleared after the consumer consumes the data. Belongs to the one-to-many mode, as shown in the figure:
3. System architecture
I found a good architecture diagram on the Internet:
The figure above identifies a Kafka architecture consisting of producers, brokers, consumers, and a ZooKeeper cluster. Create a Partition for each Topic:
Here are the characters:
3.1, Producer
Message producers, brokers that push messages into a Kafka cluster.
3.2, Consumer
Message consumers pull messages from the Kafka cluster and consume messages.
3.3, Consumer Group
A Consumer Group consists of one or more consumers, each of whom belongs to a Consumer Group. A consumer group is logically a subscriber. Each consumer in the consumer group is responsible for consuming data of different partitions, and a partition can only be consumed by one consumer in the group; Consumer groups do not influence each other. That is, each message can only be consumed by one Consumer in the Consumer Group. But it can be consumed by multiple Consumer groups. In this way unicast and multicast are realized.
3.4, the Broker
A Kafka server is a Broker, and a cluster consists of multiple brokers, each of which can hold multiple topics.
3.5, the Topic
The category or topic of a message, logically understood as a queue. Producers focus only on which topics they push messages to, and consumers focus only on which topics they subscribe to.
3.6, Partition
For load balancing and scalability considerations, a Topic can be divided into multiple partitions and physically stored on multiple brokers in a Kafka cluster. For reliability, each Partition has a backup Replica.
3.7, up
To ensure that the Partition data on a node in the cluster is not lost and Kafka can continue to work, Kafka provides a replication mechanism. Each Partition in a Topic has several copies. A Leader and several followers.
3.8, Leader
Replica’s main role, Producer and Consumer, only interact with the Leader.
3.9, followers
Replica slave role synchronizes data from the Leader in real time and keeps data synchronization with the Leader. When the Leader fails, a Follower becomes the new Leader.
3.9, the Controller
One of the servers in a Kafka cluster that performs the Leader election and various Failover operations.
3.9, the ZooKeeper
Kafka uses Zookeeper to store cluster meta information.
4, Topic and Partition
A Topic can be thought of as a class of information, a logical queue, with Topic specified for each message. In order for Kafka throughput to increase linearly, the Topic is physically divided into one or more partitions. Each Partition appends the log file at the storage layer. After the message is pushed in, it is appended to the end of the log file. The position of each message in the file is called offset, which is a long number that uniquely identifies a piece of information. Since each message is appended to the end of the Partition, it is sequentially written to the disk, which is very efficient. As shown in figure:
5. Network model
Kafka’s network model is based on Reactor model, which is the response model. The Kafka network model is divided into two parts: Kafka client (Consumer and Producer) is a single-thread Reactor model, and Kafka server is a multi-thread Reactor model.
5.1. Single-thread Reactor
As shown in figure:
The Reactor thread is responsible for multiplexing sockets, accepting new connections, and dispatching requests to the Handler Handler.
5.2. Multithreaded Reactor
The following is from; Message-oriented middleware — A brief description of the NIO network communication model in Kafka
- Acceptor: An Acceptor thread that listens for new connection requests, registers OP_ACCEPT events, and sends new connections to the Processor thread in the “round Robin “manner.
- Processor: N Processor threads, each of which has its own selector, register the corresponding OP_READ event to the Acceptor’s SocketChannel. The size of the OP_READ event is determined by num.net worker.Threads.
- KafkaRequestHandler: M request processing threads, contained in the thread pool KafkaRequestHandlerPool, fetch request data from RequestChannel’s global requestQueue and hand it to KafkaApis. The size of M is determined by num.io. Threads;
- RequestChannel: It is the request channel of the Kafka server. This data structure contains a global requestQueue requestQueue and multiple response queues corresponding to the Processor responseQueue. A place for the Processor to exchange data with the request-processing threads KafkaRequestHandler and KafkaApis.
- NetworkClient: The underlying layer is the corresponding encapsulation of Java NIO, located in the Kafka network interface layer. Kafka message producer object – The Send method of KafkaProducer mainly calls NetworkClient to send messages.
- SocketServer: A NIO service that starts an Acceptor Acceptor thread and multiple Processor threads. A typical Reactor multithreading model is provided, which separates receiving client requests from processing requests.
- KafkaServer: represents an instance of a Kafka Broker; The startup method is the entry for instance startup.
- KafkaApis: Kafka’s business logic processing Api, which handles different types of requests; For example, “send message”, “get message offset-offset” and “Process heartbeat request”.
reference
Kafka architecture principle Kafka architecture diagram Kafka design analysis (a) : Introduction to message-oriented middleware — A brief talk about the NIO Network communication model (Reactor) pattern in Kafka