Learn from me Kafka: How to operate efficiently

As a beginner Kafka, I need to grow up quickly and take on the responsibility of maintaining Kafka. I follow three steps to learn Kafka:

  • Read about Kafka

  • Learn Kafka from the perspective of operational combat

  • Read the source code, systematic, fine master its implementation principle

This paper belongs to the second stage of learning: [learning Kafka from the perspective of operation and maintenance practice], focusing on learning the Topic of Kafka, creating and updating the Topic through operation and maintenance commands, and understanding the internal operation mechanism of Topic in Kafka from the Topic’s operable properties.

1. Basic use of Kafka Topic operation and maintenance commands


Kafka provides kafka-topics steps to create, modify, delete, and query topics in ${kafka_home}/bin/kafka-topics.

options type Detailed instructions
–create The command Create a Topic
–delete The command Delete the Topic
–alter The command Change the Topic
–describe The command See the Topic
–delete-config The command Delete properties for the Topic configuration
–list The command Query topic list
–bootstrap-server options Specifies the address of the Kafka cluster
–zookeeper options Zk cluster address, with the –bootstrap-server option must have one cannot be empty
–partitions options The number of partitions for a Topic can be thought of as the share of data shards, and each partition can have multiple copies.
–replication-factor options Replica factors for partitions (including the number of leaders)
–topic options Topic name
–if-exists options When modifying or deleting a topic, this option, if specified, is executed only if it exists
–if-not-exists options When creating a topic, if this option is specified, it is executed only if the topic does not exist
–replica-assignment options With the specified partition and copy received, the official documentation looks responsible, as detailed below.
–disable-rack-aware options Rack-aware copy assignment is disabled
–exclude-internal options Filter internal permissions when querying topics using the –list command
–topics-with-overrides options Use –describe to show only topic-related attributes, not partition information.
–unavailable-partitions options Use –describe to show only partitions that are not available to the Leader, if there is output information,Need to pay attention to.
–under-replicated-partitions options Use –describe to display only partition information where replicas are not in the ISR. If there is output information,Need to pay attention to.
–config options Set the topic properties

Here’s a separate look at some of the less intuitive options listed above.

1.1 – up – the assignment

The specified number of replicas and partitions are received. This parameter cannot be used together with –partitions and –replication-factor.

The format is as follows: each comma indicates the configuration of a partition, and each partition’s distributed brokers are separated by a colon.

What does replication-factor0:1,1:2,0:2 mean?

The number of partitions is 3, with partition 0(P0) spread over Broker 0 and 1, partition 1(P1) spread over Broker 1,2, and partition 2(P2) spread over Broker 0 and 2. The number of partitions is 3, the replica factor is 2, and the first broker of each partition is the Leader.

2, Kafka Topic configuration item details



The kafka-Topics script customizes the properties of the topic when creating it with the –config option. The next step is to explore the mechanics behind Kafka from these properties.

  • Cleanup. policy Data file cleanup mechanism, support Broker global configuration, Topic customization, optional policies: DELETE, compact, the default value is delete. Kafka provides data segment compression, which reduces the size of data segments by keeping only the latest Key for the same Key. The system theme __consumer_offsets(the theme used to store message progress) is compact.

  • Compression. Type Compression type. Kafka currently supports the following compression algorithms: Gzip, SNappy, LZ4, and ZSTD.

  • Uncompressed does not enable compression

  • Producer Specifies the compression algorithm by the sender. The value can be gzip, SNappy, LZ4, or ZSTD on the client.

    Data compression saves network bandwidth and storage space, but increases CPU performance. Therefore, it is best practice to configure no compression algorithm on the Broker server. The compression algorithm is specified by the sender, and the compression algorithm is stored on the server and decompressed on the consumer.

  • Delete. Retention. Ms If cleanup. Policy is compact, Kafka will consider it pointless to compress messages with null body and delete them immediately. Used to set how long the messages, whose body is null, can be deleted after a compression execution. The default is 24 hours.

  • file.delete.delay.ms

    Kafka supports deletion of log files (data files) by topic. Before deletion, the partition file under this topic is renamed *.deleted and file.delete.delay.ms is deleted from the file system.

  • Flush. messages Set the flush frequency based on the number of messages. If the value is set to 1, the flush is triggered every time a message is written. The default value is long. MaxValue, which is not recommended in most scenarios. Kafka wants to use replicas to ensure that data is stored consistently and reliably.

  • Flush. Ms Sets flush frequency at intervals (default is long. MaxValue). Kafka wants to use the flush mechanism of the operating system to ensure data reliability through the copy mechanism. (Copy mechanism can not guarantee data loss caused by machine room power failure)

  • Index. interval. Bytes Indicates the density of index files. Kafka does not index each message (message offset), but builds an index at regular intervals. This parameter sets the interval. The default value is 4096 bytes.

  • Max.message. bytes Specifies the maximum number of bytes allowed to send a message at a time. The default value is 1,000,000, or about 1 MB.

  • Message. The downconversion. Enable whether open the automatic message format transformation, if set to false, the Broker will not perform the message format conversion, message will not be compatible with the old client consumption.

  • Message.format. version can specify that the topic is stored in the storage format corresponding to the API version of a particular version.

  • Message.timestamp. Type Specifies how to obtain the timestamp stored in the message. Optional values:

  • CreateTime Time when the message is created on the client

  • LogAppendTime Broker Time received by the server

    The default is CreateTime.

  • Message. The timestamp. Difference. Max. Ms when the message. The timestamp. The type is set to CreateTime, allows time and message Broker end to create timestamp is the biggest difference, if more than rich value of the parameter Settings, The Broker refuses to store the message. The default value is long. MaxValue, indicating that the opening mechanism is not enabled.

  • Min. Cleanable. Dirty. Thewire control proportion of dirty data can be compressed, the default is 0.5 d, if a file “dirty data” (not the compressed data) less than the rich value, will not continue to compress the file, the effective method of conditions for the cleanup. The policy is set to the compact.

  • Min.paction.lag. ms Sets how long a message cannot be compact after it enters the Broker. The default value is 0, indicating that this feature is disabled.

  • Min.insync. replicas If the client sets ack to all when sending messages, this parameter specifies the number of replicas that must be successfully written to the ISR before sending messages to the client. The default value is 1.

  • Preallocate Specifies whether to enable preheating files (files are created in advance). The default value is false.

  • Bytes Maximum number of bytes reserved for a log partition. The default value is -1, which indicates that there is no limit.

  • Retention. Ms Maximum retention period for a log partition. The default retention period is 7 days.

  • Segment. bytes Specifies the size of a log segment. The default value is 1 GB.

  • Segment.index. bytes Specifies the size of a log segment index file. The default value is 10 MB.

  • Segment.jitter. ms Maximum random error of segment rolling.

  • Segment. ms Kafka Specifies the interval at which a segment is forced to scroll even if the segment is not completely filled with messages. The default value is 7d

  • Unclean. Leader. Election. Enable whether to allow not in ISR copy after no ISR replica selection competition to become a leader, to do so may lose data, the default is false.

3, summarize


This article from the operation and maintenance command began to learn, from the use of operation and maintenance level of a comprehensive understanding of Topic, so as to pry its Kafka internal some important features, for the follow-up from the source point of view of its implementation to lay a solid foundation, this article finally gives a partition number of 3, copy factor for 3 Topic partition diagram to end this article.


Well, this article is introduced here, key three even (attention, like, message) is the biggest encouragement to me, of course, you can add the author wechat: DINGwPMz, common exchange and discussion.

Finally, share a core RocketMQ ebook with me and you will gain experience in the operation and maintenance of billions of message flows.

Access: wechat search “Middleware Interest circle”, reply RMQPDF can be obtained.

Walk into the authorCopy the code

Here’s some advice from a 10-year IT veteran for new employees

“I” was flattered by Ali Baba

How can programmers increase influence

How to read source code efficiently

Another way FOR me to get involved in the RocketMQ open source community