Problem 1: The role of message queues

1, the decoupling

The Courier brother has a lot of express delivery, he needs to call each time to confirm whether the consignee is available, which time period is available, and then determine the delivery plan. This is totally dependent on the consignee! If there are more than one delivery, the delivery boy will be busy like crazy…

If there is a convenience store, the Courier only needs to put the Courier in the same neighborhood in the same convenience store, and then notify the consignee to pick up the goods, at this time the Courier and the consignee can realize the decoupling!

2, asynchronous

The delivery guy had to wait downstairs after he called me, until I took your delivery and he couldn’t deliver anyone else. Express little brother will express in the small fang convenience store, and can do other work, do not need to wait for your arrival and has been in a waiting state. Improve the efficiency of work.

3, peak clipping

Suppose I bought a variety of goods in different stores on Double Eleven, and the express delivery of these stores are different, including Zhongtong, YTO, SHentong, and all kinds of express, etc..

What’s more, they all arrived at the same time! The little brother called me to take the north gate express, yuantong little brother called me to go to the south gate, Shentong little brother called me to the east gate. I was so distracted…

We can see that in the scenario where the system needs to interact, the use of message queue middleware is really beneficial. Based on this idea, there are more professional “middleware” such as Fengchao and Cainiao Station than Xiaofang Convenience store.

Question 2: What are the components in Kafka?

Topic: A Kafka topic is a pile or group of messages.

Producers: In Kafka, producers publish communications and messages to Kafka topics.

Consumer: The Kafka consumer subscribes to a topic and also reads and processes messages from the topic.

Kafka Brokers: We use Kafka Brokers to manage message stores in topics.

Question 3: A brief description of the ACK mechanism

Ack: How many responses a producer must receive from a broker before sending a message

  • 0 indicates that producer does not need to wait for the leader’s confirmation (highest throughput and lowest data reliability)
  • 1 means that the leader needs to acknowledge the local log written to it and acknowledge it immediately
  • -1/all indicates that all ISRs are confirmed after completion (lowest throughput and highest data reliability).

Question 4: How does Kafka determine whether a node is alive

(1) The node must be able to maintain the connection with [ZooKeeper] and check the connection of each node through heartbeat mechanism

(2) If the node is a follower, it must be able to synchronize the leader’s write operations in time without too long delay

Question 5: Are Kafka messages in Pull mode or Push mode

The question Kafka initially considered was whether customer should pull the message from brokes or brokers should push the message to the consumer, i.e. pull or push.

In this respect, Kafka follows a traditional design common to most messaging systems: Some messaging systems, such as Scribe and Apache Flume, use a push mode to push messages downstream to consumers.

This has both advantages and disadvantages: It is up to the broker to determine the rate at which the message is pushed, and it is not easy to handle consumers with different consumption rates.

Messaging systems are all about getting consumers to consume at the fastest rate possible, but unfortunately, in push mode, when the broker pushes faster than the consumer is consuming, the consumer crashes.

Another advantage of pull is that consumers can decide whether to pull data from the broker in bulk. The Push mode must decide whether to Push each message immediately or in batches after caching without knowing the consumption power and strategy of downstream consumers.

If you push at a lower rate to avoid a consumer crash, you can waste pushing fewer messages at a time.

A disadvantage of Pull is that if the broker has no messages available for consumption, it will cause consumers to poll in a loop until new messages arrive. To avoid this, Kafka has a parameter that lets the consumer block know when a new message arrives (it can also block until the number of messages reaches a certain number so that they can be sent in batches)

Question 6. Can you say something about the leader election process

As we know, [ZooKeeper] cluster also has an election mechanism, which uses Paxos algorithm to elect the leader by sending information from different nodes to other nodes. However, Kafka’s election of the leader is not so complicated.

Kafka’s Leader election is implemented by creating a temporary /controller node on ZooKeeper. Write the current broker to this node {” version “:1,” BrokerID “:1,” TIMESTAMP “:” 1512018424988 “} With ZooKeeper’s strong consistency feature, a node can only be created by one client. When the Leader and ZooKeeper lose connection, the temporary node will be deleted, and other brokers will monitor the changes of the node. When a node is deleted, other brokers are notified of the event and re-elect the leader.

Q7. When does Kafka rebalance

There are five conditions that trigger rebalance.

  • Condition 1: A new consumer is added
  • Condition 2: The old consumer died
  • Condition 3: A coordinator is suspended and the cluster elects a new coordinator
  • Condition 4: New partition for topic
  • Condition 5: The consumer calls unsubscrible() and unsubscribes the topic

When rebalance occurs, all consumer instances in a Group are coordinated together. Kafka ensures the most equitable allocation possible. But the Rebalance process can have a serious impact on the consumer group. All consumer instances under the Consumer Group will stop working during the Rebalance and wait for the Rebalance to complete.

Question 7.1: Can you briefly describe the rebalance process?

The main process is as follows:

  1. Sending a GCR request to find a Coordinator: This process sends a request to the broker with the least load in the cluster. After the broker returns successfully, it acts as a Coordinator and attempts to connect to the Coordinator
  2. Send a JGR request to join the group: When a Coordinator is successfully located, it initiates a request to join the group, indicating that the consumer is a member of the group. The Coordinator receives the request and assigns a Leader (usually the first Leader) to the cluster to be responsible for partition allocation
  3. Sending an SGR request: After the JGR request succeeds, if the current Consumer is the leader, the Consumer allocates partitions and initiates an SGR request to send the distribution result to the Coordinator. If it is not the leader, the SGR request is also made, but the allocation result is null

Question 7.2: What impact does Rebalance have

Rebalance itself is a protection for a Kafka cluster. It is used to eliminate consumers who cannot consume or are too slow. Because of the large amount of data we consume and the need to write IO to the network, we may time out because of the slow third-party services we rely on. Rebalance affects our data in the following ways:

  1. Repeated data consumption: The consumed data will also fail to submit the offset task. When the partition is allocated to other consumers, repeated data consumption will occur, which will increase the cluster pressure
  2. Rebalance spreads to all consumers in the ConsumerGroup. Because one consumer quits, the Group Rebalance and reaches a stable state in a relatively slow time
  3. Making too much Rebalance slows down the speed at which messages are being consumed. You Rebalance and Rebalance most of the time
  4. Data cannot be consumed in a timely manner, accumulates lag, and is discarded after Kafka’s TTL, all of which are fatal to our system.

Question 7.3. How do I solve a problem with rebalance?

To avoid Rebalance, start with the timing of the Rebalance. As we said earlier, there are three main times when Rebalance occurs:

  • The number of group members has changed
  • The number of subscribed topics has changed
  • The number of partitions subscribed to the topic changed

The last two can be avoided artificially. The most common cause of rebalance is changes in the consumer group.

Adding and stopping Consumer members normally causes rebalance. This is inevitable, but in some cases a Consumer instance can be “kicked” out of the Group by a Coordinator who mistakenly believes it has “stopped.” This can lead to rebalance.

After the Consumer Group completes the Rebalance, each Consumer instance periodically sends a heartbeat request to a Coordinator to indicate that it is still alive. If a Consumer instance fails to send heartbeat requests ina timely way, the Coordinator will consider the Consumer “dead” and remove it from the Group. The Coordinator then makes a new Rebalance. This time can be configured using the parameter session.timeout.ms on the Consumer side. The default value is 10 seconds.

In addition to this parameter, Consumer provides a parameter that controls how often heartbeat requests are sent, heartbeat.interval.ms. The smaller this value is set, the more frequently the Consumer instance will send heartbeat requests. The Coordinator can tell each Consumer instance that the Rebalance is on more quickly, because the Coordinator is telling each Consumer instance that the Rebalance is on. Encapsulate the REBALANCE_NEEDED flag in the response body of a heartbeat request.

In addition to the above two parameters, there is another parameter on the Consumer side that controls the impact of the Consumer’s actual consumption power on the Rebalance. This parameter is Max. Poll.interval. It limits the maximum interval between two calls to the poll method by the Consumer application. The default value is 5 minutes, which means that if your Consumer cannot consume the poll message within 5 minutes, the Consumer will initiate a “leave the group” request and the Coordinator will start a new round of Rebalance.

Here’s a look at some of the things that rebalance can be avoided:

The first type of Rebalance is unnecessary because a Consumer is “kicked” out of the Group because it fails to send a heartbeat in time. In this case, you can set session.timeout.ms and heartbeat.interval.ms to avoid making this rebalance. (The following configurations are best practices found online and have not been tested yet)

Set session.timeout.ms to 6s. Set heartbeat.interval.ms to 2s. Ensure that the Consumer instance can send at least 3 heartbeat requests before it is judged dead, that is, session.timeout.ms >= 3 * heartbeat.interval.ms. The purpose of setting session.timeout.ms to 6s is to enable coordinators to quickly locate failed consumers and kick them out of the Group.

The second type of unnecessary Rebalance is that consumers spend too much time consuming. In this case, the parameter value of max.poll.interval.ms is very important. To avoid unexpected Rebalance, you’d better set this parameter to a larger value than your downstream maximum processing time.

In short, allow plenty of time for business processing logic. This way, the Consumer doesn’t make Rebalance because it takes too long to process these messages.

Question 7.4: How long does a reblance to Kafka take

After several rounds of testing, it was found that the rebalance time was about 80ms~100ms and the average time was about 87ms.

Q8. How to ensure sequential kafka consumption

This seems to me to be a bogus proposition. Why use Kafka if you want to ensure sequential consumption, just asynchronous or decoupled? If you have to do sequential consumption, you can, but this is a waste of resources, because Kafka is designed for high concurrency and throughput. 1. One topic, one partition, and one thread 2. One topic, n partitions, and n threads need to send sorted data to the specified message key

9. Why is Kafka so fast

Kafka implements the zero-copy principle to move data quickly and avoid switching between kernels. Kafka can send data records in batches, from the producer to the file system (Kafka topic log) to the consumer, and view these batches of data end-to-end.

Batch processing allows for more efficient data compression and reduces I/O latency. Kafka uses sequential writes to disks to avoid the waste of random disk addressing. For more information on disk addressing, see Hard Disk: What Programmers need to know about Hard Disks. To sum up, there are four main points

  • In order to read and write

  • Zero copy

  • Message compression

  • Sent separately