This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

This article interview plot is false, but the knowledge is real, please watch carefully in the company of your family or friends, to prevent in the process of watching a trance, absent-mindedly resulting in no knowledge.

Performance report

A wearing a plaid shirt, hair like a punch superman middle-aged man walked to come over, yes he is the interviewer, resume in hand, he mused, I was scared at the time, and then ask him: the young man ah, our side is infrastructure middleware group, since you have mentioned that kafka’s resume, I next ask you kafka’s knowledge.

Me: Well, I haven’t read much about Kafka, but I know a little bit about it, so I didn’t write about it. The interviewer stroked his sparse beard: Let’s get started. Where is kafka’s Log file? I: Kafka topics can be partited, so the Log corresponds to a folder named topic-partition. For example, for a topic with two partitions, its Log resides in XXX /topic-1 and XXX /topic-2. Interviewer: So the location of the log file should be XXX /topic-1/data.log or XXX /topic-2/data.log? Me: No, kafka logs are segmented. Under each partition folder, there are actually many log segments that make up a log. Each log segment is 1 gb in size, and if a log segment is finished, a new segment is automatically written. Interviewer: Why are they segmented? How about not paragraphing? I: Segmentation can be good data maintenance, first of all, not segmentation, when looking for a piece of data will be very troublesome, just like looking for data in a Xinhua dictionary without a directory, if we are divided into segments, we only need to know which segment the data is in, and then look in the corresponding segment. In addition, since logs are persistent disks, disk space cannot be infinite. If some old data needs to be deleted, only the older data segments need to be deleted through the segmentation mechanism. Interviewer: Hold on, hold on ~, you said that we only need to know which segment the data is in, so how do we know which segment the data is in? Me: Easy, easy, Kafka maintains a skip list internally. The skip list nodes are the segment numbers of each segment. In this way, when querying data, the target data segment can be quickly located according to the skip table. Interviewer: Skip lists can speed up access, but how is the segment number of each segment determined? Me: The kakfa segment number is actually based on the offset. It represents the offset of the data with the smallest offset in the current segment. For example:

Segment number of Segment1 is 200 and segment number of Segment2 is 500, then segment1 stores messages with offsets 200-499.

The interviewer: HMMM, so after locating the segment, how to locate the specific message, directly traverse?

IKafka uses a sparse index to search for specific messages. After each log segment, there are two index files:.index and.timeIndex.

The.index file is what I call an offset index file, it does not create an index for each message, it creates an index for every other range, so it is called a sparse index.

For example, when we want to find message 6, we first load the sparse file index.index into the memory, then through the binary bits to message 5, and finally through the physical address pointed to by message 5, and then search down until message 6 is found.

The interviewer: What are the benefits of sparse indexes?

I: Sparse index is a compromise scheme, not only does not take up too much space, but also provides some fast retrieval ability.

The interviewerTimeindex file: What is it for?

I: By default, kafka saves data that is less than 7 days old, and deletes data that is more than 7 days old. The clearing logic is based on the maximum timein the timeindex file. If the difference between the maximum time and the current time is more than 7 days, kafka deletes data that is less than 7 days old. Then the corresponding data segment will be cleaned up.

The interviewer: When it comes to data cleansing, what else is there besides the time-by-date thing you said?

IFor the log file size, if the log file (the sum of all data segments) is greater than the threshold we set, the log file will be cleaned from the first data segment until the condition is met. For the log start offset, if the log segment’s start offset is less than or equal to our threshold, the corresponding data segment is cleaned up.

The interviewer: Do you know message merge? If you know the benefits of message merging.

IMessage merge is to combine multiple messages together and send them to the broker in a single RPC call. This benefit will undoubtedly reduce many NETWORK IO resources. Secondly, messages will have a CRC check.

The interviewerWhen will merged messages be sent to the broker?

IThe merged message is in the buffer and is sent to the broker if the buffer is nearly full or if no production message has been produced for a while.

The interviewer: Do you know about message compression?

I: Know that compression is the use of CPU time to save bandwidth costs, compression can make the packet size smaller, the producer is responsible for the data message compression, the consumer after the message decompression.

The interviewerAll only producers can be compressed?

IIf the producer-specified compression algorithm is not the same as that specified by the broker, the broker will use the producer’s compression algorithm and then use its own compression algorithm. This may affect the overall throughput. The same thing can happen if the old and new versions of the compression algorithm are not compatible, such as the broker version is older and does not support the new compression algorithm.

The interviewerWe know that kafka messages are written to disk. Is disk IO slow?

IKafka’s messages are sequentially read and written to disks. Tests show that a cluster of six 7200r/min raid-5 arrays can achieve linear write speeds of up to 600MB/s and random write speeds of up to 100KB/s, a performance difference of 6000 times. Operating systems can make deep optimizations for linear reads and writes, such as read-ahead (reading a large disk block into memory ahead of time) and write-behind (combining many small logical writes into a large physical write operation). Sequential disk writing is not only faster than random disk writing but also faster than random memory writing.

The interviewerSequential reads and writes are designed to solve the slow disk problem. Are there any other network optimizations?

I: yes, zero copy. In the absence of zero copy, messages interact like this:

  1. Cut to kernel: The kernel copies disk data to the kernel buffer
  2. Switch to user mode: copy kernel data to user programs
  3. Switching to kernel mode: copy user data to the kernel socket buffer
  4. The socket copies data to the network adapter

It can be found that a copy of data after many times, and finally back to the kernel state, it is a waste.

When you have zero copy:

  1. Disk data is copied to the kernel buffer
  2. The kernel buffer sends the descriptor and length to the socket and sends the data directly to the nic

It can be found that by zero copy, two copy processes are reduced and the overhead is greatly reduced.

Reliable article

Interviewer: how does Kafka’s multi-consumer model work? Me: If you want to support multiple consumers to consume a topic at the same time, the simplest way is to copy a topic, but this will undoubtedly waste a lot of space, especially when there are many consumers.

So Kafka devised offsets, a piece of data that different consumers could access based on location.

Interviewer: Do you know where the consumer’s offset exists? I: A long time ago, it was in ZooKeeper, but the offset needed to be updated frequently, and ZooKeeper was not suitable for frequent updates. Therefore, the consumer shift was later stored in a topic called _consumer_offset, which was automatically created when the first consumer started. Default 50 partitions, 3 copies. Interviewer: What exactly is stored in the _consumer_offset? I: Value can be simply considered as our consumer displacement. As for key, we need to elaborate here, because every consumer belongs to a consumer group and actually consumes a topic partition. Therefore, group-topic-partition can be associated with the corresponding consumers, which is the composition of key. Interviewer: Can you tell us about the way consumers submit shifts? Me: There are automatic submission and manual submission. If we commit automatically, we don’t need to intervene. Kafka commits for us after we consume the message. If we commit manually, we commit ourselves after consuming the message. Interviewer: What’s wrong with the automated referral? Me: The auto-commit policy is for the consumer to commit shifts every 5 seconds by default. If the consumer does not consume data for a long time, the auto-commit policy will keep submitting duplicate shifts, resulting in many duplicate messages to the _consumer_offset. Interviewer: What’s the solution? I: The core problem with this scenario is that there may be a large number of duplicate messages taking up storage space. Just remove the duplicate messages. Kafka provides a sort of redis (aofrewrite) function called compact, which is done by a logCleaner thread. It cleans up duplicate and older messages.

Interviewer: What if the consumer reboots automatically and the consumer fails to submit? I: this will cause repeated consumption, general business needs to cooperate to do idempotent. Interviewer: Can manual submission solve this problem? Me: No, if we submit manually after business processing, but the submission is not received and submitted, there is a restart or other reasons leading to the submission failure, repeated consumption will also occur after the recovery of consumers. Interviewer: What if I submit first and deal with the business logic later? Me: There is no guarantee of 100% problem in this case. If the submission is successful but there is an error in processing the business, normally it is not possible to re-consume the data because it has already been submitted, unless you reset the offset. In short, no matter which solution can guarantee 100% perfection, we need to do idempotence according to business situation or find missing data according to log. Interviewer: When the consumer submits the consumption shift, does it submit offset of the latest news currently consumed or offset+1? I: offset + 1. Interviewer: From the producer’s point of view, talk about not losing messages. Me: Regarding message loss, the producers of Kafka provide users with three strategies to choose from. Each strategy has its own advantages and disadvantages and needs to be selected based on the actual situation of the business.

  1. The first is that the producer does not care about the situation of the message and is only responsible for sending it. In this mode, the speed is undoubtedly the fastest and the throughput is the best, but it may cause a lot of data loss. For example, when the Borker problem occurs, the producer keeps sending, and the data will be lost during the recovery of the broker.
  2. The second is that the producer needs all copies written successfully, no matter Leader or Follower. The more Follower copies there are, the worse the throughput theory will be. However, in this mode, the message is the most secure.
  3. The third method is that the producer only needs to receive the ACK from the Leader copy and does not care about the write status of the Follower copy. This method is a compromise, which ensures certain security and does not affect throughput too much.

If you don’t care about your data loss and want to swallow things like logs, you can use the first option. If you care about your data security, you can use the second option. The third option is recommended if you want a slightly better throughput while keeping the data secure, but the problem with the Follower copy is invisible to the producer.

Interviewer: Can you tell me how a Follower replica is elected Leader? Me: There are several concepts in Kafka:

  • AR: collection of all copies
  • ISR: Collection of all copies eligible for election
  • OSR: Collection of replicas that fall too far behind or fail

AR = ISR + OSR. Under normal circumstances, AR should be the same as ISR. However, when a Follower replica falls too far behind or a Follower replica fails, it is removed from ISR and placed into OSR. The first replica in the ISR is elected as the new Leader node. For example, if AR=[1,2,3] and 1 fails, then ISR=[2,3],2 will be elected as the new Leader.

The interviewer smoothed his left fringe. Is there anything else you want to ask me? Me: teacher, can you combine a punch please?

The interviewerB: I don’t know the one-two punch, but there will be a lot of combinations coming up to meet you.

I:

To be continued…

Past highlights:

Kafka Literacy – Thinking with a simple delete, I found so much knowledge… Learn about rollback and persistence

Finally, wechat search [pretend to understand programming], if you have any questions, welcome to contact me, if my article has a question, also welcome to correct, if you love learning, like to study, you can pay attention to me.