Messages to each partition in Kafka are written in order, and a single partition can only be consumed by a single consumer, thereby ensuring that messages are sequenced. But messages between partitions are not guaranteed to be ordered.
Kafka guarantees the orderliness of messages within a Partition
The unit of Kafka distribution is partition, and the same partition is organized by a write Ahead log, so the order of FIFOS can be guaranteed. The order between different partitions cannot be guaranteed. However, most users can be defined by message key, because messages with the same key are guaranteed to be sent only to the same partition.
When sending a message in Kafka, you can specify topic, partition, and key parameters. Partiton and Key are optional. If you specify a partition, then all messages are sent to the same partition, which is ordered. And on the consumer side, Kafka guarantees that a partition can only be consumed by one consumer. Or if you specify a key (such as an order ID), all messages with the same key will be sent to the same partition. However, if there is a problem with multiple threads inside the consumer, the solution is to use memory queue processing, distribute the key hash to memory queue, and then each thread processes the data of a memory queue.
How does Kafka not consume duplicate data? For example, deduction, we can’t double deduction.
In fact, we still have to combine business to think, I give a few ideas here:
1 For example, if you want to write data to the library, you should first check the primary key, if the data is already available, you should not insert, update ok.
If you write Redis, that’s fine, it’s set every time anyway, natural idempotent.
For example, if you are not in the above two scenarios, it is a little more complicated. You need to ask the producer to add a globally unique ID, such as order ID, when sending each piece of data. Then when you consume the data here, you can check it in the Redis first according to this ID. If it hasn’t been consumed, you process it, and then that ID is written Redis. If you consume too much, don’t process it. Just make sure you don’t process the same message twice.
Such as database based unique key to ensure that duplicate data does not repeatedly insert multiple. Because of the unique key constraint, duplicate inserts only generate errors and do not cause dirty data in the database.
What are the two conditions for Kafka to determine if a node is still alive?
(1) The node must be able to maintain the connection with ZooKeeper. ZooKeeper checks the connection of each node through the heartbeat mechanism
(2) If the node is a follower, it must be able to synchronize the leader’s write operations in time without too long delay