Interviewer: Why don’t we talk about message queues today? I see a lot of Kafka in your project
Candidate: Mm-hmm
Interviewer: Why don’t you briefly describe a situation in which you use Kafka
Candidates: There are three general purposes for using message queues: decoupling, asynchrony, and peak clipping
Candidate: For example, in my project, I am currently maintaining a message management platform system that provides interfaces to various business parties to call
Candidate: After they invoke the interface, they actually deliver the message “asynchronously.”
Candidate: The interface layer simply puts the message on the message queue and then returns the result directly to the interface caller.
Candidates: The benefits are:
Candidates: 1. The throughput of the interface will be greatly improved (the INTERFACE RT will be very low because no real calls are made)
2. The system will not be affected even if there are a large number of message invocation interfaces (traffic is carried by message queues)
Interviewer: HMM…
Candidate: For example, I also have a project called AD order attribution project. The main thing is to get the order data and calculate the corresponding commission for each business advertisement.
Candidate: The order data is fetched from the message queue
Candidate: The benefits of this design are:
Candidates: 1. Students in the trading team only need to write the order message to the message queue, and the Topic of the order data will be consumed by each business party and used by themselves [decoupling] [asynchronous]
Candidates: 2. Even if QPS orders soar, there is no great perception of downstream business (because downstream business only consumes the data of message queue and does not directly affect machine performance)
Interviewer: Well, why do you think the message queue is peaking?
Interviewer: Or to put it another way, why does Kafka carry such a large QPS?
Candidate: The “core” function of a message queue is to store production data and then read it out for various businesses.
Candidates: Not the same as when we are dealing with requests: we may call someone else’s interface when we are dealing with business, we may need to look up the database… And so on and so forth
Candidates: Like Kafka, there are many optimizations in the “store” and “read” processes
Candidates: To name a few examples, for example:
Candidate: When we send or read messages to a Topic, there are actually multiple partitions internally.
Candidates: Kafka writes internally to disk sequentially when storing messages, and uses the operating system buffer to improve performance
Candidate: Also reduce the number of CPU copies of files in read/write data [zero copy]
Interviewer: Well, since you mentioned reducing the number of times the CPU copies files, can you tell me about this technique?
Candidate: Well, yes, it’s zero copy technology.
Candidate: For example, when we normally call the read function, the following steps occur:
Candidates: 1. DMA copies the disk to the read kernel cache
Candidate: 2. The CPU copies the data read from the kernel buffer to the user space
Candidate: When the write function is normally called, the following steps occur:
Candidate: 1. The CPU copies user-space data to the Socket kernel cache
Candidate: 2. DMA copies data from the Socket kernel buffer to the nic
Candidate: You can see that it takes 2 DMA copies and 2 CPU copies to complete “One read and write”. DMA copying does not save, so the so-called zero copy technology is to save CPU copying
Candidate: And in order to prevent user processes from directly manipulating the kernel and ensure kernel security, the application calls system functions, context switch (the above process will occur four times in total)
Interviewer:… .
Candidates: Currently, the main zero-copy technologies are Mmap and SendFile, which can reduce context switching and CPU copying to a certain extent
Candidate: For example, Mmap maps the address of the read buffer to the address of the user space, implementing the read kernel buffer and application buffer sharing
Candidate: Thus reducing one CPU copy from the read buffer to the user buffer
Candidate: The last read and write using mmap can be simplified to:
Candidates: 1. DMA copies disk data to read kernel buffers.
2. The CPU copies the read kernel cache to the Socket kernel buffer.
3. DMA copies the Socket kernel buffer to the nic
Candidate: Saves a CPU copy because the read kernel buffer is mapped to user space
Interviewer: HMM…
Sendfile +DMA Scatter/Gather sends the file descriptor/length information from the read kernel cache to the Socket kernel buffer to achieve zero copy by CPU
Candidates: Use SendFile +DMA Scatter/Gather to simply read and write:
Candidates: 1. DMA copies disk data to read kernel buffers.
The CPU sends the file descriptor and length of the read buffer to the Socket buffer.
DMA copies data from the read kernel buffer to the nic based on the file descriptor and data length
Candidate: Back to Kafka
Candidates: From Producer->Broker, Kafka persists data from a network adapter using mmap (from 2 CPU copies to 1).
Candidate: From Broker->Consumer, Kafka sends data from hard disk to nic using sendFile (zero copy CPU)
Interviewer: Let me stop you for a second. I have a few things to do. Let me summarize what you said
Interviewer: The reason you use Kafka is for async, peak clipping, decoupling
Interviewer: What makes Kafka so fast is parallelism, full use of operating system caches, sequential writes, and zero copy
Interviewer: Is that right?
Candidate: HMM
Interviewer: OK, let’s continue next time. I’m a little busy here
Welcome to follow my wechat official account [Java3y] to talk about Java interview
[Online interviewers] seriesTwo continuous updates a week!
Original is not easy!! Three times!!