Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

directory

Kafka is a parameter that many students cannot understand

2, a Kafka production side of the example code

3. Size of memory buffer

4. How much data is appropriate to pack into a Batch?

5. What if one Batch can’t be filled?

6. Maximum request size

Retry mechanism

8. Persistence

Kafka is a parameter that many students cannot understand

Today we will talk about a very interesting topic, you know that many companies are based on Kafka as MQ to develop some complex large-scale systems.

When using Kafka’s client to write code to interact with the server, it is necessary to set a lot of parameters for the client.

So I’ve seen a lot of young people, maybe new to the team, who don’t really know much about Kafka.

This will cause them to look at some of the code written by some senior colleagues in the team, will not understand what is going on, do not understand the meaning behind this, especially some Kafka parameter Settings.

In this article, we will use the old rules of the form, to talk about the Kafka production side of some common parameters Settings, so that the next time you see some Kafka client Settings parameters, will not feel scared again.

2, a Kafka production side of the example code

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092"); 
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("buffer.memory", 67108864); 
props.put("batch.size", 131072); 
props.put("linger.ms", 100); 
props.put("max.request.size", 10485760); 
props.put("acks", "1"); 
props.put("retries", 10); 
props.put("retry.backoff.ms", 500);
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);
Copy the code

3. Size of memory buffer

First let’s look at what the parameter “buffer.memory” means.

Kafka clients send data to the server. The messages sent by KafkaProducer are buffered. That is, the messages sent by KafkaProducer go into the client’s local buffer, and then are collected into batches and sent to the Broker.

So this “buffer.memory” is essentially a constraint on the size of the buffer KafkaProducer can use, and its default value is 32MB.

Now that we know what this means, let’s think about how this parameter should be set in a production project.

If the buffer is set too small, what can happen?

First of all, it is clear that a large number of messages are cached in the memory buffer, forming a Batch, and each Batch contains multiple messages.

Then KafkaProducer has a Sender thread that wraps the Batch into a Request and sends it to the Kafka server.

If the memory setting is too small, it can cause a problem: messages are quickly written to the memory buffer, but the Sender thread doesn’t have time to send the Request to the Kafka server.

Does this cause memory buffers to fill up quickly? Once full, it blocks the user thread from writing messages to Kafka.

So the buffer.memory parameter should be measured in your own case. You need to calculate how many messages per second your user threads are writing to the buffer in a production environment.

Let’s say 300 messages per second, then you need to test, if you write 300 messages per second to 32MB of memory buffer, will the memory buffer always fill up? After this pressure test, you can debug to a reasonable memory size.

4. How much data is appropriate to pack into a Batch?

Then you need to think about the second question, which is how to set your “batch.size”? This thing determines how much data you want to store in each Batch and then send it out.

For example, if you set the Batch size to 16KB, then the Batch contains 16KB of data and can be sent.

The default value of this parameter is 16KB, and you can generally try to adjust this parameter to a larger size and test it with the messaging load of your production environment.

For example, if the sending frequency is 300 messages per second, if the value of batch.size is adjusted to 32KB or 64KB, can the overall throughput of sending messages be improved?

Theoretically, increasing the batch size could allow more data to be cached in the batch, so that more data could be sent at a time, which could increase throughput.

However, this thing can not be infinitely large, too large, if the data is always buffered in the Batch is not sent out, then you will be very late to send messages.

For example, a message enters Batch, but it takes 5 seconds for the Batch to fill up to 64KB before it can be sent. The delay for this message is five seconds.

Therefore, it is necessary to adjust different Batch sizes in accordance with the message sending rate of the production environment to test the final outgoing throughput and message delay, and set the most reasonable parameter.

5. What if one Batch can’t be filled?

If a Batch fails to be filled, another parameter, “lingering.ms”, needs to be introduced at this time.

What he means is that after a Batch is created, no matter how long it is filled or not, it must be sent out.

To give you an example, let’s say batch.size is 16KB, but it’s very slow to send messages at some low peak times.

This leads to the possibility that after Batch is created, messages come in one after another, but it is slow to gather up 16KB. Do you have to wait all the time at this time?

Of course not, assuming that you now set “Linger. ms” to 50ms, the Batch will be sent as long as it has passed 50ms since it was created, even if it has not reached 16KB.

Therefore, “lingering. ms” determines that once your message is written into a Batch, the maximum waiting time is this much, and it must be sent out along with the Batch.

Avoid a Batch of messages that cannot be sent out due to the delay of the Batch. This is a critical parameter.

This parameter should be set very carefully, in conjunction with batch.size.

For example, first of all, assume that your Batch is 32KB, then you need to estimate how long it usually takes to gather a Batch under normal circumstances, for example, 20ms May gather a Batch under normal circumstances.

Then your Linger. ms can be set to 25ms, that is to say, normally, most of the Batch will be filled in 20ms, but your Linger. ms can guarantee that even in the low peak period, if the Batch cannot be filled in 20ms, the Batch will be forced to be sent out after 25ms.

If you set Linger. Ms too small, for example, the default is 0ms, or you set 5ms, it may cause that although your Batch is set to 32KB, it often fails to gather 32KB of data. After 5ms, the Batch is directly sent out, which is not very good. It will lead to your Batch in name only, has been unable to gather data.

6. Maximum request size

The “max-request. size” parameter determines the maximum size of each request sent to the Kafka server. It also limits the maximum size of a single message, which can be flexibly adjusted according to the size of your own message.

Just to give you an example, the messages your company sends are big messages, and each message is a lot of data, and a message might be 20KB.

Do you need to adjust batch.size to be larger? Like 512KB? Then do you need to increase buffer.memory? Like 128MB?

Only in this way can you use Batch to package multiple messages in large message scenarios. Max-request. size should be added simultaneously.

Because one of your requests may be very large and the default value is 1MB, can you adjust it to a larger size, such as 5MB?

Retry mechanism

“Retries” and “retries. Backoff. ms” determine the retry mechanism, which is how many times a request can be retried if it fails, and the number of milliseconds between each retry.

This is appropriate to set several retry opportunities, give a certain retry interval, such as 100ms retry interval.

8. Persistence

The “acks” parameter determines the persistence strategy for outgoing messages. There are many other concepts involved in this, but you can refer to our previous article on “acks” :

If you write Kafka, the interviewer will probably ask you to talk about the impact of acks parameters on message persistence.

**END**

Welcome to long press the picture below to pay attention to the public number: Huoia architecture notes!

The official number backstage replies the information, obtains the author exclusive secret system study material

Architecture notes, BAT architecture experience taught each other