Here’s a look at some of the architecture behind Kafka, one of the most frequently asked questions in interviews with Internet companies.
Kafka is a high throughput, low latency, high concurrency, high performance messaging middleware, which is widely used in the field of big data. A well-configured Kafka cluster can even perform extremely high concurrent writes of hundreds of thousands or millions per second.
So how does Kafka achieve such high throughput and performance?
Let’s talk about this article bit by bit.
Page caching + sequential disk writes
First, Kafka writes to disk every time it receives data, as shown in the following figure:
So here we can not help but have a question, if the data is stored on disk, frequently write data to disk files, the performance will not be poor? Everyone must think disk write performance is terrible.
Yes, if it’s really as simple as the diagram above, it’s true that the performance is relatively poor.
But in fact Kafka has a very good design here, which is to ensure data write performance. First Kafka is based on the operating system page cache to implement file write.
The operating system itself has a layer of cache, called the Page cache, which is an in-memory cache. We can also call it the OS cache, which means the cache managed by the operating system itself.
When you write to disk files, you can write directly to the OS cache, that is, only to memory, and then the operating system decides when to actually flush the data from the OS cache to disk files.
This step alone can improve disk file write performance significantly, because you are actually writing to memory, not disk, as shown in the following figure:
And then the other thing is that when Kafka writes data, which is crucial, he writes it in disk order. That is, data is simply appended to the end of the file, not modified at random locations in the file.
Ordinary mechanical disks do have poor performance if you write them randomly, that is, randomly find a place in the file to write data to.
But if you write sequentially to the end of the file, the performance of sequential disk writes can be almost as good as memory writes themselves.
Kafka writes data from the OS’s page cache, so it performs very well, essentially writing to memory.
On the other hand, it writes sequentially to disk, so even when data is flushed to disk, the performance is very high, and it is similar to writing to memory.
Based on the above two points, Kafka achieves extremely high write performance.
If Kafka writes a single piece of data in a millisecond, is that 1,000 pieces of data per second? But what if Kafka’s performance is so high that writing a piece of data takes 0.01 milliseconds? Can you write 100,000 numbers per second?
Therefore, to ensure the core point of writing tens or even hundreds of thousands of data per second, is to improve the performance of each data write as much as possible, so that more data can be written in a unit time, improving throughput.
Zero copy technology
Let’s talk about consumption.
As you probably know, we often consume data from Kafka, so when we consume data, we actually read some data from Kafka’s disk file and send it to downstream consumers, as shown in the figure below.
So what’s the performance bottleneck here if you’re constantly reading data from disk and sending it to consumers?
Assuming that Kafka does nothing but simply read data from disk and send it to downstream consumers, the process looks something like this:
Check whether the data to be read is in the OS cache. If not, read the data from the disk file and put it in the OS cache.
Then, data is copied from the OS cache of the operating system to the cache of the application process, and then from the cache of the application process to the Socket cache of the operating system. Finally, data is extracted from the Socket cache and sent to the network card, and finally sent to the downstream consumption.
The whole process is shown in the figure below:
If you look at the image above, you can obviously see that there are two unnecessary copies!
It is copied from the operating system cache to the application process cache, and then copied from the application cache back to the operating system Socket cache.
In order to make these two copies, there are several context switches between the application execution and the operating system execution.
So reading data in this way is more performance costly.
Kafka addresses this problem by introducing zero-copy technology when reading data.
In other words, the data in the cache of the operating system is directly sent to the network card and then sent to the downstream consumers. In the middle, the step of copying data twice is skipped. Only one descriptor is copied to the Socket cache, and no data is copied to the Socket cache.
Take a look at the picture below to get a sense of the subtle process:
With zero-copy technology, there is no need to copy data from the OS cache to the application cache and then from the application cache to the Socket cache. Both copies are omitted, so it is called zero-copy.
Socket caching is simply copying data descriptors and sending data directly from the OS cache to the network card, which greatly improves the performance of reading file data during data consumption.
You will also notice that when reading data from disk, the OS cache is checked to see if it is in memory. If so, the data is read directly from memory.
If kafka clusters are well tuned, you will find that a large amount of data is written directly to the OS cache and then read from the OS cache.
Kafka provides write and read data entirely in memory, so the overall performance is extremely high.
As a side note, the OS cache provides a high performance search for Elasticsearch. Kafka provides a high performance search for Elasticsearch. Kafka provides a high performance search for Elasticsearch.
Third, the final conclusion
Through this article to kafka at the bottom of the page caching technology, the use of disk order writing ideas, and the use of zero copy technology, kafka we should understand each machine in the underlying data for writing and reading to take what kind of way of thinking, why he can be so high, the performance of hundreds of thousands of throughput per second.
If you are interested in Java, microservices, distributed, high concurrency, high availability, large Internet architecture technology, interview experience exchange, you can add my architecture group: 834962734 to get information, update information daily in the group, free of charge.
This design idea is of great help to us when we design middleware architecture or go out for an interview.