Java APIS send consumption messages
This article intends to briefly explain Kafka workflow, deepen the understanding of Kafka
Partitions and replicas
As mentioned earlier, a topic has multiple partitions. In fact, the smallest unit for partitioning a message is partition, and each partition has multiple replicas (typically replicas also contain master replicas). A message is sent from the producer, falls to a partition on a broker, and the consumer pulls the message
Each partition has a leader(master copy) and zero or more followers (slave copies). Each leader and follower is a broker (copies of different partitions can be in the same broker). Kafka allocates the leaders of all partitions equally to the broker. All reads and writes are performed by the leader alone. Followers only synchronize messages from the Leader and do not serve the leader.
Take a look at this diagram to help you understand
How does a producer know who is the leader of a partition? When configuring producer, we need to configure a list of brokers with the parameter bootstrap.servers. We tell the producer several brokers, and the producer pulls the leader list of all partitions from one of them and caches them so that the broker can send messages directly to the leader
Earlier we said that messages within a partition are ordered. This is because producer uses its partition algorithm to figure out which partition a message should go to, then identifies the partition’s leader(broker) and directly sends the message to the broker. Only one consumer subscribes to the partition, so messages are kept in order.
Message persistence
As mentioned earlier in the overview, Kafka is also a storage system. Brokers receive messages and persist them to disk
We started with 3 brokers again, and then created a topic with 3 partitions and 3 copies
> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 3 --topic test
Copy the code
Let’s look at this topic
> bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test
Topic:test PartitionCount:3 ReplicationFactor:3 Configs:segment.bytes=1073741824
Topic: test Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
Topic: test Partition: 1 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0
Topic: test Partition: 2 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
Copy the code
As expected, three partitions, three leaders, spread across three brokers, and three replicas
Dirs specifies the location of Kafka logs (Kafka messages are stored as logs). Put the data into different folders according to the format of topic-partition, such as the test theme above, three partitions, as can be seen in log.dirs
> ls
test-0
test-1
test-2
Copy the code
These log files are the vehicle for message persistence to disk. One might ask, isn’t persistence to disk much slower than memory? However, Kafka relies heavily on the disk setup to achieve high throughput. Modern disk optimization is pretty good, and sequential writes to disk are actually faster in some cases than random reads of memory. A typical example is the operating system’s preference for using disks as virtual memory. In addition, Kafka doesn’t have to worry about GC (Scala also runs on the JVM) without having to maintain a lot of data in memory. In addition Kafka avoids switching between kernel and user mode and unnecessary data replication by directly using sendFile system calls. In addition, another cost of a messaging system is bandwidth. Kafka has the ability to compress messages, and the compression algorithm can be specified.
The above content IS excerpted from the official website documents