Principle is very important, the interview is not possible to ask you command, is to ask the principle, understand the principle line if the use of Kafka problems can be quickly positioned, rather than a face masked circle. Must understand the principle, if not the principle of direct combat, it is really moving bricks.

Topic

Create a TopicA Topic with three partitions stored on different servers. Note that Topic is a logical concept.

Partition & Partition copy

Kafka’s topics can be divided into one or more partitions, which are physical concepts. If the number of copies for a topic is set to 3, there will be 3 identical copies for each partition. Below we set 3 copies of TopicA partition 0,1, and 1 respectively and store them in broker0,1,2.

Segmented log storage

Since messages generated by producers are continuously appended to the end of log files, Kafka uses sharding and indexing to prevent inefficient data location due to large log files.

It divides each Partition into multiple segments, and each Segment corresponds to two files: the. Index index file and the. Log data file.

Leader & Follow

Moreover, each replica has roles. They elect one replica as the leader and the rest as followers. When producers send data, they send it directly to the leader partition, and then the follower partition synchronizes data with the leader. When consumers consume data, they also consume data from the Leader. Broker0 is the leader for Topica-partition -0, and other Topica-partition -N is the leader for broker0.

Consumer & Consumer group

A consumer group consists of one or more consumer instances for easy expansion and fault tolerance. A partition will not let the same consumer group inside the multiple consumers to consume, a consumer is able to consume the data of multiple partitions.

Kafka network design

  1. A client sends a request to an Acceptor. The broker has three processor threads (the default is three). Acceptors do not process the request from the client. The sending mode is polling, which means sending to the first processor, then the second processor, then the third processor…
  2. The consumer thread consumes these Socketchannels with a Request;
  3. The default thread pool has eight ReaderThreadPool threads. These threads are used to process requests, parse requests, and return a response.
  4. The processor reads the response data from the response and sends it back to the client.

So if we need to optimize Kafka, add more processors and more processing threads in the thread pool, we can do that. The part of request and response actually serves as a cache, considering that the processor generates requests too fast and cannot process them in time due to insufficient threads. So this is an enhanced version of the REACTOR Network threading model.

Kafka zero copy

Traditional IO:

Buffer = file.read socket.send (buffer)Copy the code

1. The first time: read the disk file into the operating system kernel buffer; 2. The second time: copy the kernel buffer data to the application buffer; 3. Step 3: Copy the data in the application program buffer to the socket network sending buffer (the buffer belonging to the operating system kernel); 4. The fourth time: Copy the socket buffer data to the network adapter for network transmission.

In the traditional way, it is very tedious to copy data for four times after reading disk files and sending them over the network. The actual I/O reading and writing required I/O interrupts and CPU response interrupts (bringing context switches). Although DMA was later introduced to take over the INTERRUPT requests from the CPU, there were “unnecessary copies” of quadruple copy.

Zero copy:

The zero-copy application used by Kafka requires the kernel to copy data directly from disk files to sockets without going through the application. Not only does zero copy greatly improve application performance, but it also reduces context switching between kernel and user mode.

The role of ZooKeeper in Kafka clusters

1. Broker registration

Brokers are distributed and independent, but a registry system is needed to manage brokers across the cluster, and Zookeeper is used. On Zookeeper there is a node dedicated to Broker server list logging: /brokers/ IDS

At startup, each Broker registers with Zookeeper to create its own node under /brokers/ IDS, such as /brokers/ids/[0…N].

Kafka uses a globally unique numeric ID to refer to each Broker server. After a node is created, each Broker records its IP address and port information to the node. The type of node created by the Broker is a temporary node. If the Broker fails, the corresponding temporary node will be deleted automatically.

2. Topic registration

In Kafka, the mapping between Topic partitions and brokers is maintained by Zookeeper and recorded by dedicated nodes such as /borkers/topics

Every Topic in Kafka is logged as /brokers/topics/[Topic], such as /brokers/topics/login and /brokers/topics/search. After the Broker server is started, it registers its Broker ID on the corresponding Topic node (/brokers/topics) and writes the total number of partitions for that Topic, such as /brokers/topics/login/3->2, This means that the node with Broker ID 3 provides two partitions for message storage to the “login” Topic. Again, the partition node is a temporary node.

3. Consumer registration

① Register nodes to consumer groups. When each consumer server is started, it creates its own consumer node under the specified node of Zookeeper, for example, /consumers/[group_id]/ids/[consumer_id]. After the creation of the node, The consumer writes information about the Topic he subscribed to to the temporary node.

② Register and monitor the changes of consumers in consumer groups. Every consumer needs to pay attention to the changes of other consumer servers in the consumer group. That is, Watcher monitors the changes of child nodes on /consumers/[group_id]/ IDS nodes. Once it finds the increase or decrease of consumers, it triggers the load balancing of consumers.

4. The relationship between zoning and consumers

In Kafka, each message partition can only be consumed by one Consumer in the same group. Therefore, the relationship between the message partition and the Consumer needs to be recorded on Zookeeper. Once each Consumer determines the consumption right of a message partition, The Consumer ID needs to be written to the temporary node corresponding to the Zookeeper message partition, for example:

/consumers/[group_id]/owners/[topic]/[broker_id-partition_id]

Where [brokerid-partition_id] is the identity of a message partition, and the node content is the Consumer ID of the Consumer on that message partition.

5. Record message consumption progress Offset

When a consumer consumes a specified message partition, the consumption progress Offset of the partition message needs to be recorded on Zookeeper periodically. In this way, after the consumer restarts or another consumer takes over the message consumption of the message partition, the consumer can continue to consume messages from the previous progress. Offset is recorded by a special node in Zookeeper. The node path is:

/consumers/[group_id]/offsets/[topic]/[broker_id-partition_id]

The content of the node is the value of Offset.

6. Producer load balancer

The same Topic message can be partitioned and distributed among multiple brokers. Therefore, producers need to send messages to these distributed brokers properly. How to implement load balancing among producers? Kafka supports traditional four-tier load balancing and Zookeeper load balancing.

(1) Four-tier load balancing. Usually, a producer corresponds to a single Broker, and messages generated by that producer are sent to that Broker. This approach is logically simple. Each producer does not need to establish additional TCP connections with other systems, but only needs to maintain a single TCP connection with the Broker. However, it can not achieve true load balancing, because in the actual system, the amount of messages generated by each producer and the amount of messages stored by each Broker are different. If some producers generate far more messages than others, the total number of messages received by different brokers will be greatly different. Producers are also not aware of Broker additions and deletions in real time.

(2) Use Zookeeper for load balancing. Since every Broker is started and the Broker registration process is completed, producers will dynamically perceive the change of the Broker server list through the change of the node, so that dynamic load balancing mechanism can be realized.

7. Consumer load balancing

Like producers, consumers in Kafka also need to perform load balancing so that multiple consumers can reasonably receive messages from the corresponding Broker server.