Because kafka is often used in work, but some of kafka’s internal mechanism is not very familiar with, so recently in kafka related knowledge, we know kafka is a very classic message engine, it is known for high performance, high availability. So how does it achieve high performance and high availability? In what form are its messages persisted? If you write disk, why so fast? How does it keep messages from getting lost… ? With this set of questions, we lift the veil on Kafka.

Let’s start with the question: Why do we need a messaging engine? Why can’t you just go to RPC? Take an order system as an example: when we place an order, we should first reduce the inventory of goods, then the user pays the deduction, the merchant account adds money… , and finally may send push or SMS to tell the user that the order has been successfully placed and the merchant that the order has been placed.

If the whole ordering process is blocked synchronously, the time will increase, the user will wait longer, and the experience will be not good. Meanwhile, the longer the link that the ordering process depends on, the greater the risk. To speed up response and reduce risk, we can decouple some services that are not necessarily stuck in the master link. The most critical core of placing an order is to ensure the consistency of inventory, user payment and merchant payment. The notification of messages can be completely asynchronous. This way, the entire order process will not be blocked by notifying merchants or notifying users, nor will the order fail because they failed.

The next step is how to design a message engine. From a macro point of view, a message engine supports sending, storing and receiving.

A simple message queue model appears, as shown above, where Engine stores the sender’s message so that when the receiver comes to Engine for data, Engine can send the data response from the store to the receiver. Since persistent storage is involved, slow disk IO is an issue to consider. And the receiving party may be more than one, in the above order, for example, after the completion of orders through the message to send complete event, this time is responsible for the development of the user side push need to consume the message, is responsible for the development of merchant side push also need to consume this message, you can think of the simplest approach is to copy the two sets of message, but it does seem a bit a waste? High availability is also a consideration, so if we have a copy of engine, if an engine node fails, we can elect a new copy to work on. Even with copies, there may be multiple senders. At this time, it seems unreasonable for all senders to send data to a Leader (master) node, because the pressure of a single node is too great. You might say: Isn’t there a copy? Let the receiver read the message directly from the copy. This leads to another problem: what if the replica Leader’s message is delayed? Unable to read the message read the Leader again? If so, the design of the engine seems more complicated, which seems unreasonable. The answer is sharding technology. Since the pressure of a single Leader node is too great, it can be divided into multiple Leader nodes. We only need a good load balancing algorithm to evenly distribute messages to each sharding node through load balancing. So we can design a producer-consumer model that looks something like this.

But these are simple ideas, and how to implement them is complicated. With these questions and ideas in mind, let’s take a look at how Kafka works.

Thinking and Realizing

Let’s start with a few terms for Kafka, including messages, topics, partitions, and consumer groups.

How do you design a message

The message is the source of the service, all the design is to send the message from one end to the other, which involves the structure of the message, the message body should not be too large, too large is easy to cause storage costs rise, network transmission overhead, so the message body only need to contain the necessary information, preferably not redundant. Messages should also support compression, which can reduce storage and network overhead even as the message body is already thin. Message is to be persistent, be consumed messages cannot store all the time, or the possibility of a very old message is again consumption is not big, need a mechanism to clean up the old message, release the disk space, how to find out the news of the old is the key, so each message with a best news production timestamp, old calculated through the timestamp message, Delete when appropriate. Messages also need to be numbered. On the one hand, the number represents the location of the message, and on the other hand, consumers can find the corresponding message through the number. How to store a lot of news is a problem, all stored in a file, query efficiency low and not conducive to clean up the old data, so using segmented, by means of piecewise cut the big log file into multiple smaller log files to improve the maintainability, so that when inserted into the message just increase at the end of the period, However, if the whole segment is loaded into the memory to search for messages, it also seems to need a lot of memory overhead. Therefore, a set of indexing mechanism is needed to speed up access to corresponding messages through indexing.

Summary: A Kafka message contains the creation time, the sequence number of the message, supports message compression, and the log that stores the message is segmented and indexed.

Why we need Topic

At A macro level, A messaging engine is A one-way street, and there’s A problem: Producer A wants to send A message to consumer B, but it also wants to send A message to consumer C. So how do consumers B and C consume only the data they need? A simple way to think about it is to add a Tag to the message, and consumers can retrieve their own messages based on the Tag and skip messages that are not their own, but this seems less elegant and wastes CPU resources on filtering messages. So the most efficient way to do this is not to send a message to C for B, and not to send a message to C for B, so that’s Topic. Different businesses can be differentiated by Topic. Each consumer only needs to subscribe to the Topic he/she cares about, and the producer sends the message that the consumer needs through the agreed Topic, so it is simple to understand that the message is classified by Topic.

Conclusion: Topic is a logical concept, which can be well divided into businesses. Each consumer only needs to pay attention to his or her own Topic.

How do partitions ensure order

From the above we know that the purpose of partitioning is to disperse the pressure of a single node. Combined with Topic and Message, the general hierarchy of messages is Topic ->Partition ->Message. You may ask, since partitioning is to reduce the pressure of a single node, why not use multiple topics instead of multiple partitions? In the case of multiple machine nodes, we can deploy multiple topics on multiple nodes, which seems to achieve distribution. At first glance, it seems feasible, but at second thought, it is still wrong. We will eventually have to serve the business. In this case, the original business of one topic will be divided into several topics, but the definition of business will be broken up.

Well, since there are multiple partitions, the distribution of messages is a problem. If the data under a topic is too concentrated in a partition, it will cause uneven distribution. To solve this problem, a good allocation algorithm is necessary.

Kafka support polling method, that is, in the case of multi-partition, by polling can be evenly distributed to each partition the news, here it is important to note that the data in each partition is ordered, but the overall data is no guarantee that the order, if your business is strong dependence on the order of the news, then reconsider the plan, For example, if A producer sends A message, B message, and C message, they are distributed in three partitions respectively, then the possible consumption order is B, A message, and C message.

So how do you ensure that messages are sequential? From an overall point of view, as long as the number of partitions is greater than 1, the ordering of messages is never guaranteed unless you set the number of partitions to 1, in which case throughput is a problem. From the actual business scenarios, generally we may need to A user’s messages, or some goods order is ok, the news of the user A and user B who who before it doesn’t matter, because no connection between them, but the user A message we might want to maintain orderly, such as message description is the user’s behavior, behavior of the order can’t be messy. At that time we can consider using key hash way, the same user id, through the hash can always keep to a partition, we know that the internal partition is ordered, so in this case, the same user message must be ordered, and different users can be assigned to different partitions, so also use the multi-partition features.

conclusionKafka messages as a whole are not guaranteed to be ordered, but messages for individual partitions are guaranteed to be ordered.

How to design a reasonable consumer model

Since the message model is designed, consumers are essential. The simplest way to implement consumers is to start a process or thread to pull messages directly from the broker. This makes sense, but what if the production rate is faster than the current consumption rate? The first thing that comes to my mind is to create another consumer and increase consumption speed through multiple consumers. There seems to be a problem here. What should we do if both consumers consume the same message? Locking is a solution, but it is less efficient. You might say that the nature of consumption is reading, and reading can be shared, as long as the business is idempotent, it is ok to consume messages repeatedly. In this case, if 10 consumers compete for the same information, 9 consumers end up wasting their resources. So in need of multiple boost consumer spending at the same time, and ensure that every consumer consumption to not be processing the message, this is the consumer groups, consumers under the group there can be multiple, we know that the topic is partitioned, so as long as the consumer groups within each customer to subscribe to different partitions. The ideal situation is that each consumer is allocated to the same amount of data partition. If the number of partitions obtained by one consumer is not equal (more or less), the data skew state will occur, and some consumers will be very busy or relaxed, which is not reasonable, which requires a set of balanced allocation strategy.

There are three main kafka consumer partition allocation strategies:

  1. Range: This strategy is for topic, dividing the number of topic partitions and the number of consumers. If there is a remainder, it means that the excess partitions are not equalized. At this point, the top consumers will be allocated one more partition. But if consumers subscribe to more than one topic, and each topic averages several more partitions, then consumers at the top of the list will consume many more partitions.

As it is divided according to topic dimension, finally:

  • C1 consumes topIC0-P0, topIC0-P1, TopIC1-P0, topIC1-P1
  • C2 consumption topIC0-P2, topIC1-P2

In the end, it can be found that consumer C1 has two more partitions than consumer C2, and it is completely possible to divide one of c1’s partitions into C2, so that the balance can be achieved.

  1. RoundRobin: The principle of this strategy is to sort the partitions of all consumers in the consumer group and all topics that consumers subscribe to lexicographically, and then assign partitions to each consumer one by one through a polling algorithm. Suppose you now have two topics, with three partitions each, and three consumers. So the general consumption situation is like this:

  • C0 consumes topIC0-P0, topIC1-P0
  • C1 consumes topIC0-P1 and topIC1-P1
  • C2 consumption topIC0-P2, topIC1-P2

It seems perfect, but if there are 3 topics now, and the number of partitions for each topic is inconsistent, for example, topic0 has only one partition, Topic1 has two partitions, topic2 has three partitions, and consumer C0 subscribes to topic0, and consumer C1 subscribes to topic0 and topic1, Consumer C2 subscribed topIC0, topIC1 and topic2, so the general consumption situation is as follows:

  • C0 consumption Topic0 – p0
  • Topic1 – p0 c1 consumption
  • C2 consumes topIC1-P1, topIC2-P0, TopIC2-P1, topIC2-P2

In this way, RoundRobin is not the most perfect. In spite of the differences in the throughput capacity of each topic partition, it can be seen that C2 has a significant consumption burden, so it is completely possible to allocate topIC1-P1 partition to consumer C1.

  1. Sticky: Range and RoundRobin both have their own disadvantages. In some cases, they could be more balanced, but they are not.

One of the purposes of Sticky is to distribute partitions as evenly as possible. In the case of the above RoundRobin 3 topics corresponding to 1, 2 and 3 partitions, c1 can consume topIC1-P1, but it does not. In this case, in Sticky mode, topIC1-P1 can be allocated to C1.

The second purpose of introducing Sticky is to keep the partition allocation as same as the last one as possible. C0, C1, c2 subscribe topic0, topic1, topic2, topic3. Each topic has two topic0, topic1, topic2, topic3.

If c1 exits, only C0 and C2 are left in the group. C0 and C2 rebalance c1. Rebalance C0 and C2.

You can see that c0’s topic 1-P1 is assigned to C2, and C2’s topic1-p0 is assigned to C0. This situation may cause the problem of repeated consumption. Before the consumer has time to submit, it is found that the partition has been allocated to a new consumer, so the new consumer will have repeated consumption. But from a theoretical point of view, after C1 exits, there is no need to move the partition of C0 and C2, just divide the partition of C1 to C0 and C2, which is sticky:

Note that in the Sticky policy, ifPartitions should be distributed as evenly as possibleandPartitions are allocated as much as possible as they were last allocatedIf a conflict occurs, the first one will be implemented first.

RoundRobin is better than Range, Sticky is better than RoundRobin, recommend you to use the best version of the supported partition policy.

Past highlights:

  • To understand the knowledge of locks
  • Learn about rollback and persistence
  • Evolution of redis IO model

Wechat search [pretend to know programming], get e-books, share the interview experience of big factory