Kafka Basics

Interviewer: Why don’t we talk about message queues today? I see a lot of Kafka in your project

Candidate: Mm-hmm

Interviewer: Why don’t you briefly describe a situation in which you use Kafka

Candidates: There are three general purposes for using message queues: decoupling, asynchrony, and peak clipping

Candidate: For example, in my project, I am currently maintaining a message management platform system that provides interfaces to various business parties to call

Candidate: After they invoke the interface, they actually deliver the message “asynchronously.”

Candidate: The interface layer simply puts the message on the message queue and then returns the result directly to the interface caller.

Candidates: The benefits are:

Candidates: 1. The throughput of the interface will be greatly improved (the INTERFACE RT will be very low because no real calls are made)

2. The system will not be affected even if there are a large number of message invocation interfaces (traffic is carried by message queues)

Interviewer: HMM…

Candidate: For example, I also have a project called AD order attribution project. The main thing is to get the order data and calculate the corresponding commission for each business advertisement.

Candidate: The order data is fetched from the message queue

Candidate: The benefits of this design are:

Candidates: 1. Students in the trading team only need to write the order message to the message queue, and the Topic of the order data will be consumed by each business party and used by themselves [decoupling] [asynchronous]

Candidates: 2. Even if QPS orders soar, there is no great perception of downstream business (because downstream business only consumes the data of message queue and does not directly affect machine performance)

Interviewer: Well, why do you think the message queue is peaking?

Interviewer: Or to put it another way, why does Kafka carry such a large QPS?

Candidate: The “core” function of a message queue is to store production data and then read it out for various businesses.

Candidates: Not the same as when we are dealing with requests: we may call someone else’s interface when we are dealing with business, we may need to look up the database… And so on and so forth

Candidates: Like Kafka, there are many optimizations in the “store” and “read” processes

Candidates: To name a few examples, for example:

Candidate: When we send or read messages to a Topic, there are actually multiple partitions internally.

Candidates: Kafka writes internally to disk sequentially when storing messages, and uses the operating system buffer to improve performance

Candidate: Also reduce the number of CPU copies of files in read/write data [zero copy]

Interviewer: Well, since you mentioned reducing the number of times the CPU copies files, can you tell me about this technique?

Candidate: Well, yes, it’s zero copy technology.

Candidate: For example, when we normally call the read function, the following steps occur:

Candidates: 1. DMA copies the disk to the read kernel cache

Candidate: 2. The CPU copies the data read from the kernel buffer to the user space

Candidate: When the write function is normally called, the following steps occur:

Candidate: 1. The CPU copies user-space data to the Socket kernel cache

Candidate: 2. DMA copies data from the Socket kernel buffer to the nic

Candidate: You can see that it takes 2 DMA copies and 2 CPU copies to complete “One read and write”. DMA copying does not save, so the so-called zero copy technology is to save CPU copying

Candidate: And in order to prevent user processes from directly manipulating the kernel and ensure kernel security, the application calls system functions, context switch (the above process will occur four times in total)

Interviewer:… .

Candidates: Currently, the main zero-copy technologies are Mmap and SendFile, which can reduce context switching and CPU copying to a certain extent

Candidate: For example, Mmap maps the address of the read buffer to the address of the user space, implementing the read kernel buffer and application buffer sharing

Candidate: Thus reducing one CPU copy from the read buffer to the user buffer

Candidate: The last read and write using mmap can be simplified to:

Candidates: 1. DMA copies disk data to read kernel buffers.

2. The CPU copies the read kernel cache to the Socket kernel buffer.

3. DMA copies the Socket kernel buffer to the nic

Candidate: Saves a CPU copy because the read kernel buffer is mapped to user space

Interviewer: HMM…

Sendfile +DMA Scatter/Gather sends the file descriptor/length information from the read kernel cache to the Socket kernel buffer to achieve zero copy by CPU

Candidates: Use SendFile +DMA Scatter/Gather to simply read and write:

Candidates: 1. DMA copies disk data to read kernel buffers.

The CPU sends the file descriptor and length of the read buffer to the Socket buffer.

DMA copies data from the read kernel buffer to the nic based on the file descriptor and data length

Candidate: Back to Kafka

Candidates: From Producer->Broker, Kafka persists data from a network adapter using mmap (from 2 CPU copies to 1).

Candidate: From Broker->Consumer, Kafka sends data from hard disk to nic using sendFile (zero copy CPU)

Interviewer: Let me stop you for a second. I have a few things to do. Let me summarize what you said

Interviewer: The reason you use Kafka is for async, peak clipping, decoupling

Interviewer: What makes Kafka so fast is parallelism, full use of operating system caches, sequential writes, and zero copy

Interviewer: Is that right?

Candidate: HMM

Interviewer: OK, let’s continue next time. I’m a little busy here

Welcome to follow my wechat official account [Java3y] to talk about Java interview

[Online interviewers] seriesTwo continuous updates a week!

Original is not easy!! Three times!!

Related Posts

Go a library of Gron every day

Linux removes mysql service completely

Sentry – Snuba Data Model