The Broker configuration

Kafka involves a number of configuration options, including installation and tuning. However, most tuning options can use the default configuration unless there are specific tuning requirements

Note: There are some configuration items that can be used with default values on a single machine, but when deployed to other environments, most of the configuration will need to be modified to be used in the cluster

I. General basic configuration

1, the broker. Id

A single Kafka server is called a broker, and each broker needs an identifier, represented by broker.id. It defaults to 0 and can be set to any other integer. However, this value must be unique in the entire Kafka cluster. It is recommended that they be set to integers associated with the machine name to better map the machine name to the ID. For example, if the machine name is project name -1, then the broker.id is set to 1

2, the port

By default, port 9092 is listened on. You can change this parameter to set it to any other available port. If the port number is below 1024, you need to start Kafka with root permission (not recommended).

3, the zookeeper. Connect

The default is localhost:2181, which specifies the Zookeeper address to hold the broker metadata. The configuration parameters are a comma-separated set of hostname:port/path lists with the following meanings:

  • Hostname: indicates the name or IP address of the Zookeeper server
  • Port: indicates the Zookeeper client connection port
  • Path: indicates the optional Zookeeper path. If this parameter is not specified, the root path is used by default

It is best to specify a set of Zookeeper servers in the configuration file, separated by a semicolon. If a Zookeeper server goes down, the broker can connect to another node in the Zookeeper group

4, the dirs

As we all know, Kafka stores all messages on disk, and the directory where these log fragments are stored is specified by log.dirs. It is a set of local file system paths separated by commas. If multiple paths are specified, the broker stores log fragments from the same partition in the same path according to the “least used” rule.

Note that the broker adds partitions to the path with the least number of partitions, rather than to the path with the least disk space

5, num. Recovery. Threads. Per. Data. Dir

Set the number of processing threads in each log directory. The default value is 1. Since these threads are only used when the server is started and shut down, it is perfectly possible to set up a large number of threads to operate in parallel. Especially for servers with a large number of partitions, in the event of a crash, using parallel operations can save hours of recovery time.

Note: Setting this parameter corresponds to a single log directory specified by log.dirs. That is, if set to 8, log.dirs specifies three paths, and a total of 24 threads are required.

6, auto. Create. Topic. The enable

By default, Kafka creates themes in one of three situations

  • When a producer starts writing messages to a topic
  • When a consumer starts reading messages from a topic
  • When either client sends a metadata request to a topic

Suggestion: It is better to set this parameter to false.

The default configuration of the theme

Kafka provides many default configuration parameters for newly created themes. Some parameters, such as the number of partitions and data retention policies, can be configured separately for each topic using the administration tool.

7, num. Partitions

Specifies how many partitions the newly created topic contains. The default value is 1. Note that you can increase the number of partitions but not decrease them, so if you want a subject to have less than the value specified by num. Partitions, you need to create them manually. Kafka clusters scale horizontally through partitions, so when new brokers join the cluster, the number of partitions can be used to balance the load of the cluster. Of course, this does not mean that in the case of multiple topics (spread across multiple brokers), the number of topic partitions must be greater than the number of brokers in order for the partitions to be spread across all brokers. However, topics with a large number of messages require a large number of partitions if they are load-balanced.

Q: How do you choose the number of partitions?

A: Consider the following factors when choosing quantity:

  • How much throughput does the topic need to achieve? For example, whether you want to write 100KB or 1GB per second
  • What is the maximum throughput for reading data from a single partition? Each partition typically has one consumer, and if the consumer does not write data to the database at more than 50MB per second, the throughput of data read from a partition should not exceed 50MB per second.
  • A similar approach is used to estimate the throughput of data written by a producer to a single partition, but producers are generally much faster than consumers, so it is best to estimate more throughput for producers.
  • The number of partitions each broker contains, available disk space, and network broadband.

Summary: Number of partitions = topic throughput/consumer throughput

8, the retention of hours

Specifies how long data can be retained. The default is 168 hours, or a week. If more than one parameter is specified, Kafka will preferentially use the parameter with the smallest value.

9, the retention. Bytes

Specifies the number of message bytes to determine whether the message is expired. This parameter applies to each partition, that is, if a topic has eight partitions and log.retention. Bytes is set to 1GB, the topic can hold up to 8GB of data, so as the number of partitions increases, so does the amount of data that can be retained for the entire topic.

Note: If both log.retention. Hours and log.retention. Bytes are specified, the message is deleted as long as either condition is met.

10, the segment. Bytes

When the log fragment size reaches the specified number (default: 1G), the current log fragment is closed and a new log fragment is opened. If a log fragment is closed, it begins to wait for expiration. The smaller the value of this parameter is, the more frequently new files are closed and allocated, reducing the overall disk write efficiency.

11, the segment. Ms

Log. Section. ms and log. Retention. Bytes are not mutually exclusive. Log fragments are closed when the size or time limit is reached, depending on which condition is satisfied first.

12, message. Max. Bytes

Specifies the size of a single message. The default value is 1000 000, which is 1MB. If a producer tries to send a message larger than this size, not only will the message not be received, but it will receive an error message from the broker. As with other byte-related configuration parameters, this parameter refers to the compressed message size, meaning that the actual size of the message can be larger than this value as long as the compressed message is smaller than mesage.max.bytes