What do you say when an interviewer asks you, “Briefly describe partition allocation in Kafka”? In fact, there is a hidden problem in this question, because Kafka partitions appear in many places, and the way this question is framed is to subconsciously suggest that you answer one, so that after you have answered the question perfectly, the interviewer will suddenly say: what else?

When you finish a dot, the interviewer will say “More”, when you add another dot, he will say “more”, even when you add a third dot, he will say “more”. Do you get confused at this point?

Today we are going to tell you how to answer this question to get a head start.

Partition allocation is a very important concept in Kafka, but it is often overlooked by the reader. It affects the overall performance balance of Kafka. When it comes to partitioning, keep in mind that there are three places: producers send messages, consumers consume messages, and topics are created. While these three operations can all be called “partition allocation,” they actually involve different things.

In the face of the opening question, it is better to give a conclusion, say there are three points, first, second and third Balabala, really let you finish the three points, time is about the same. A smart interviewer will see that you’ve made a summary at the beginning, and the best they can do is ask you to name the one you’re most familiar with, because they may already know you’re the right person.

Here is an explanation for these three places. However, this article is intended to provide a list of relevant knowledge points and a relevant description of popular science so that the reader can trace them to their roots, but it does not state the details, because there is too much detail and space is limited, please consult the old understanding Kafka if necessary.

Partition allocation of producers

For the user, when the send method is called to send a message, the message is automatically sent to the broker. During this process, interceptors, serializers, and partitioners may have to be used before they are actually sent to the broker.

producer.send(record);
Copy the code

A message needs to determine the partition to which it is sent before it can be sent to the broker. If a partition field is specified in the ProducerRecord message, there is no need for a partition because partition represents the partition number to which it is sent. If no partition field is specified in the ProducerRecord message, you need to rely on the partition divider to calculate the value of the partition based on the key field. The purpose of a divider is to partition messages.

The default partition provided in Kafka is DefaultPartitioner, which implements the Partitioner interface (the user can implement this interface to customize the partition). The partition method is used to implement the specific partition allocation logic:

public int partition(String topic, Object key, byte[] keyBytes,
                     Object value, byte[] valueBytes, Cluster cluster);
Copy the code

By default, if the key of the message is not null, the default partition partition hashes the key (using the MurmurHash2 algorithm with high performance and low collision rate), eventually calculates the partition number based on the hash value, and messages with the same key are written to the same partition. If the key is null, the message will be polled to the availability zones within the topic.

Note: If the key is not null, the computed partition number will be any one of the partitions. If the key is null and there are availability zones, the partition number calculated is either of the availability zones. Note the difference between the two.

Partition allocation of consumers

By default, each partition can only be consumed by one consumer in the same consumer group. A consumer’s partitioning assignment assigns a consumer in a consumer group a partition in a topic to which they subscribe.

As shown in the figure, there are four partitions on a topic: P0, P1, P2, and P3. Two consumer groups A and B have subscribed to this topic, with four consumers in consumer group A (C0, C1, C2, and C3) and two consumers in consumer group B (C4 and C5). The default Kafka rule is that each consumer in consumer group A is assigned to one partition, and each consumer in consumer group B is assigned to two partitions, with no effect on each other. Each consumer can only consume messages in the partition to which it is assigned.

Kafka provides three partitioning strategies for consumers: RangeAssignor, RoundRobinAssignor, and StickyAssignor. RangeAssignor is the default partitioning strategy. For details on what these three strategies mean, check out some of the related articles, such as Understanding Kafka. It is also possible to customize partition allocation strategies by implementing the ParitionAssignor interface.

If there are multiple consumers in the consumer group, then these consumers may adopt different allocation strategies, so how to finally “decide” which specific allocation strategy to use?

I’d like to leave you with a question: Can Kafka’s default rule that a partition can only be consumed by one consumer in the same consumer group be broken? If so, how? What are the benefits of breaking?

Partition allocation at the broker end

A partition assignment for a producer specifies the partition to which each message will be sent, a partition assignment for a consumer specifies the partition to which the message can be consumed, and a partition assignment for a cluster is the partitioning copy assignment for the creation of a topic, that is, which copies of which partitions will be created in which broker. Whether partition allocation is balanced affects the overall load balancing of Kafka, including the concept of priority copy.

When creating a theme, if the replica-Assignment parameter is used, the replica of the zone is created according to the specified scheme. If the replica-Assignment parameter is not used, then internal logic is required to calculate the allocation scheme. The internal allocation logic for creating a topic using the kafka-Topics. Sh script is divided into two policies based on rack information: no rack information and specified rack information. If all broker nodes in the cluster are not configured with the broker.rack parameter, or if the disable-rack-aware parameter is used to create the topic, then the allocation policy does not specify rack information, otherwise the allocation policy does specify rack information.


Welcome to support the author’s book: “Illustrated Kafka Guide” and “Illustrated Kafka’s Core Principles”


Welcome to support my new books: In-depth Understanding of Kafka: Core Design and Practice Principles and RabbitMQ Field Guide. Also welcome to follow my wechat official account: Zhu Xiaosi’s blog.