1 introduction of Kafka
1.1 summary of Kafka
Kafka is a distributed message queue based on publish/subscribe mode. With its strong throughput, Kafka is mainly used in the field of real-time processing of big data. It plays an important role in the process of data collection, transmission and storage.
- Apache Kafka, written in Scala, is an open source messaging system project developed by the Apache Software Foundation. The goal of the project is to provide a unified, high-throughput, low-wait platform for processing real-time data.
- Kafka is a distributed message queue. Kafka stores messages according to Topic. A Kafka cluster consists of multiple Kafka instances, each instance Server is called a broker.
- Both the Kafka cluster and the Consumer cluster rely on the ZooKeeper cluster to store meta information to ensure system availability
1.2 Kafka advantages
- Support for multiple producers and consumers.
- Supports lateral scaling of brokers.
- Copy set mechanism to achieve data redundancy and ensure data is not lost.
- Categorize the data by topic.
- The compressed data is sent in batches to reduce data transmission overhead and increase throughput.
- Support for multiple patterns of messages, messages are disk-based persistence of data.
- High-performance processing of information, in the case of big data, can guarantee sub-second message delay.
- A consumer can support messages on multiple topics.
- Low CPU, memory, and network consumption.
- Supports data replication and mirroring clusters across data centers.
1.3 Kafka shortcomings
- Because the data is sent in batches, it is not really real time.
- Messages can be ordered only within a unified partition, but not globally.
- Monitoring is incomplete and plug-ins need to be installed.
- Data is lost and transactions are not supported.
- Data may be repeatedly consumed and messages may be out of order. You can ensure that messages within a fixed partition are in order. However, if a topic has multiple partitions, the order cannot be guaranteed.
1.4 Kafka architecture
- Broker: A Kafka server is a Broker. A cluster consists of multiple brokers. A single broker can hold multiple topics.
- Producer: Message producers, which are the clients that send messages to Kafka Broker.
- Consumer: Message consumers that pull messages from the Kafka broker to consume. Messages can be consumed at an appropriate rate based on the Consumer’s ability to consume.
- Topic: can be understood as a queue, producers and consumers are oriented to a Topic.
- Partition: For scalability, a very large topic can be spread across multiple brokers, and a topic can be divided into partitions, each of which is an ordered queue, with a somewhat balanced allocation of producers.
- Replication: To ensure that data on partitions on a node in a cluster is not lost and Kafka can continue to work when a node fails, Kafka provides a Replication mechanism. Each partition in a topic has several replicas, a leader, and followers.
- Leader: A partition has a leader, both for the producer sending data and for the consumer consuming data.
- Followers: Each zone has one follower, which synchronizes data with the leader in real time. When the leader fails, a follower becomes the new follower. Note that the number of duplicates in Kafka cannot exceed the number of brokers!
- Consumer Group: A Consumer Group consists of multiple consumers. Each consumer in the group is responsible for consuming data of different partitions, and a partition can only be consumed by one consumer in the group. Consumer groups do not influence each other. All consumers belong to a consumer group, that is, a consumer group is logically a subscriber.
- Offset: A consumer can specify a starting offset when consuming a message in a topic.
1.5 a ZooKeeper function
ZooKeeper plays an important role in Kafka and generally provides the following functions:
1.5.1 Broker registered
Brokers are distributed and independent of each other, but a registry system is required to manage brokers across the cluster, such as with ZooKeeper.
1.5.2 Topic registered
In Kafka, messages on the same Topic are divided into multiple partitions and distributed among multiple brokers. The Partition information and the corresponding relationship between the partitions and brokers are maintained by Zookeeper and recorded by special nodes.
1.5.3 Producer load Balancing
The same Topic message can be partitioned and distributed across multiple brokers, so producers need to properly send messages to these distributed brokers.
- Old-fashioned, four-tier load balancing determines an associated Broker for a producer based on its IP address and port. Typically, a producer corresponds to a single Broker, but the number of messages generated by each producer and the amount of messages stored by each Broker varies in a system.
- Zookeeper is used for load balancing. Since every Broker is registered when it is started, producers can dynamically sense changes in the list of Broker servers through changes in this node, thus implementing dynamic load balancing mechanism.
1.5.4 Consumer load balancing
Kafka, consumers need to be same load balance to achieve multiple consumers rationally from the corresponding Broker server receives the message, each group contains a number of consumers, consumers will only send each message grouping of a consumer, different consumer groups consumption of their specific Topic the message below, each other.
1.5.5 Relationship between zones and consumers
Kafka assigns a globally unique Group ID to each Consumer Group, which is shared by all consumers in the Group. Kafka specifies that each partition can be consumed by only one Consumer in the Group. The relationship between partition and Consumer is recorded in Zk. Once each Consumer determines the consumption right of a message partition, its Consumer ID needs to be written to the temporary node of the corresponding Message partition in Zookeeper.
1.5.6 Record message consumption progress Offset
When a Consumer consumes the specified message partition, it needs to record the consumption progress Offset of the partition message periodically on Zookeeper. In this way, after the Consumer restarts or another Consumer takes over the message consumption of the partition, The ability to continue message consumption from the previous progress.
1.57 Consumer registration
The process of Consumer and message partition allocation so that messages from different partitions within the same Topic are consumed as evenly as possible by multiple consumers.
- After Consumer is started, a node is created under ZK, and each Consumer will register to listen to the changes of consumers in the Consumer Group to ensure Consumer load balance.
- The Consumer listens on the list of brokers and performs Consumer load balancing when changes occur.
2 Kafka generation process
2.1 Write Mode
The producer uses push mode to publish messages to the broker, and each message is appended to the patition, which is a sequential write disk. Sequential writes are at least three orders of magnitude faster than random writes.
2.2 Partition Partition
2.2.1 Partition profile
When messages are sent, they are sent to a topic, which is essentially a directory. Topics are composed of Partition Logs, as shown in the following figure:
Partition in
You can see that the messages in each Partition are ordered, and the produced messages are continuously appended to the Partition log, each of which is given a unique offset value.
consumers
Partitioning makes it easy to scale in a cluster and improves concurrency.
Image understanding:
Kafka’s design comes from life, is like for road transportation, different starting point and destination need repairing highways (topic), can provide multiple lanes (partition) on the highway, traffic volume of highway (topic) take several lanes (partition) ensure smooth, flow of small road repair a few lanes to avoid waste. Tollbooths are like consumers: when there are too many cars, open several to collect fees to avoid traffic jam. When there are too few cars, open several to let the cars go in the same direction.
2.2.2 Zoning Principles
We need to encapsulate the data sent by the producer into a ProducerRecord object.
Data encapsulation
- If partition is specified, the specified value is directly used as the partiton value.
- If the partition value is not specified but there is a key, mod the hash value of the key and the number of partitions of the topic to obtain the partition value.
- In the case that there is neither a partition value nor a key value, the first call randomly generates an integer (incremented on this integer with each subsequent call) and modulates this value with the total number of partitions available for topic to obtain the partition value. It’s called the round Robin algorithm.
2.3 Kafka File Storage mechanism
Kafka storage structure
- In Kafka, messages are classified by topic. Producers and consumers are topic-oriented. Topic is a logical concept, while Partition is a physical concept
.index 'stores the data index,'.log
Store data. The metadata in the index file points to the physical offset address of Message in the corresponding log file (seekaldi,Neo4j). - To prevent inefficient data location due to large log files, Kafka implements
shard
andThe index
Mechanism to divide each partition into multiple partitionssegment. Corresponding to each segmentThe index ` with `. The log
. These files are located in a folder named topic name + partition number. For example, the first topic has three partitions, and the corresponding folders are first-0, first-1, and first-2.
100000000000000000000.index
200000000000000000000.log
300000000000000170410.index
400000000000000170410.log
500000000000000239430.index
600000000000000239430.log
Copy the code
Note: The index and log files are named offset from the first message of the current segment.
Data lookup procedure
2.4 How to ensure message sequence
2.4.1 Out of order
Multithreaded consumption
-
Kafka has one topic, one partition, and one Consumer, but the Consumer is consuming multiple threads internally, so the data can be out of order.
-
The sequential data is written into different partitions, and different consumers consume it. However, the execution time of each Consumer is not fixed, so it is impossible to guarantee that the Consumer who reads the message first completes the operation first. As a result, the message is not executed in accordance with the order, resulting in the data order error.
Multiple consumers
2.4.2 Solution
- Ensure that the same message is sent to the same partition, a topic, a partition, a consumer, and internal single-threaded consumption.
- The information written to the same Partition must be orderly.
- Specify a key for the information. If the information has the same key, it must be written to the same partition.
- Read information from the same Partition must be orderly.
Single thread consumption
- On the basis of 1, a Consumer is mapped to different queues based on the information ID to speed up consumption.
Memory queue
4 Data Reliability
4.1 Messaging semantics
Kafka ensures message delivery semantics between producers and consumers. There are three delivery guarantees:
- A message may be lost, but it is never transmitted again.
- A message is never lost, but may be transmitted repeatedly.
- Exactly once: Exactly once. The message is processed and only processed once. Do not lose do not repeat once.
Ideally you would want your system’s messaging to be strictly exactly once, but it’s hard to do. The following will be roughly described according to the dissemination process of the message.
4.2 Information from producer to Broker
4.2.1 Sending producer information to the Broker
The general steps are as follows:
- Producer finds the Leader metadata of the target Partition from ZK.
- Producer sends messages to the Leader.
- The Leader accepts message persistence and then chooses how to synchronize followers based on the acks configuration.
- Followder synchronizes the data and sends an ACK to the Leader.
- After the synchronization between the Leader and followers is complete, the Leader sends an ACK to the producer.
For a Leader reply ACK, Kafka provides users with three reliability levels, based on trade-offs between reliability and latency requirements.
- request.required.acks = 0
- producer
Don't wait for
The ACK of the broker provides a minimum delay. The broker returns a received message before it has been written to disk and may lose data if the broker fails, corresponding to the At Most Once mode.- In the case of declining disk success information is lost, the general production is not.
- request.required.acks = 1
- This is the default value, producer waits for the ACK of the broker, partition’s
The leader returns an ACK after the drive is successfully dropped
If the follower leader fails before the follower synchronization is successful, data will be lost. The leader is considered successful if it returns information.
- request.required.acks = -1 / all
- The producer waits for the ACK of the broker, and the partition’s leader and follower (in the ISR)
All trading
Return ack on success.- However, if the leader returns OK after the follower receives the message but fails to send an ACK, the producer will
again
Send a message to followers.- Corresponds to the At Least Once mode.
4.2.2 How to ensure idempotency
If a business needs data Exactly Once, in earlier versions of Kafka it could only be rerouted downstream. Now an idempotence is introduced, meaning that no matter how many duplicate messages the producer sends, the Server will persist only one data.
At Least Once = Exactly Once
Idompotence = true. Producers with idempotent enabled will be assigned a PID during initialization. Messages sent to the same Partition will have Sequence numbers attached to them. That’s how we judge uniqueness. However, if the PID is restarted, it will change, and different partitions also have different primary keys, so idempotent cannot guarantee Exactly Once across partition sessions.
4.3 Kafka Broker Information falls to disk
Data falling process
How the Kafka Broker will trade after receiving the message is set using producer.type, usually with two values.
- Sync, default mode, data must finally fall disk to be OK.
- In async mode, data is flushed to the OS’s Page Cache and returned. If the machine suddenly fails, the information is lost.
4.4 Consumers consume data from Kafka Broker
Consumption data
Consumer works in the form of Consumer Group, which consists of one or more consumers who jointly consume a topic. Each partition can only be read by one consumer in the group at a time, but multiple groups can consume the partition at the same time. If a consumer fails, other group members automatically load balance the partitions read by the previous failed consumer. The Consumer Group pulls messages from the Broker and consumes them in two phases:
- Get the data and submit the Offset.
- Start processing the data.
If you submit offset first and then process the data, an exception may occur during the processing of the data and the data may be lost. If you process the data before submitting the offset, a failure to submit the offset may result in repeated consumption of information.
PS:
The downside of the pull pattern is that if Kafka has no data, the consumer may get stuck in a loop that keeps returning empty data. For this reason, Kafka consumers pass in a timeout when consuming data. If no data is currently available for consumption, the consumer will wait a certain amount of time before returning.
5 Kafka partition allocation policy
A consumer in the same group.id has a partition allocation strategy for message consumption in multiple partitions within a topic.
In Kafka, there are two kinds of partition allocation strategy, through the partition. The assignment. The strategy to set up.
- RangeAssignor range partitioning policy, also the default mode.
- RoundRobinAssignor Allocation policy, polling partition mode.
5.1 RangeAssignor Range Zoning Policies
The partitioning strategy is for each topic. Firstly, the partitions within the same topic are sorted by serial number, and the consumers are sorted alphabetically. If there are now 10 partitions and 3 consumers, the sorted partitions will be P0 to P9. After the consumers are sorted, it will be C1-0, C2-0, and C3-0. The number of Partitions/consumers is used to determine how many Partitions each Consumer should consume. If not, then the first few consumers will consume 1 more partition.
consumers | Division of consumption |
---|---|
C1-0 | Consume p0, 1, 2, 3 partitions |
C2-0 | Consumption 4, 5, 6 zones |
C3-0 | Consumption 7, 8, 9 zones |
Disadvantages of Range partitioning:
As mentioned above, only for one topic, the influence of c1-0 consumers’ consumption of one more zone is not great. If there are more than N topics, c1-0 consumers will consume one more partition for each topic. The more topics there are, C1-0 consumers will consume N more partitions than other consumers. This is one of the obvious drawbacks of Range partitioning.
5.2 RoundRobinAssignor Polling partition policies
The RoundRobin polling partition policy lists all partitions and all consumers, sorts them according to Hascode, and allocates partitions to each consumer using the polling algorithm. Polling partitions can be divided into the following two scenarios:
- Consumers in the same Consumer Group have the same subscription information
- Consumer subscription information is different within the same Consumer Group
5.2.1 Consumers in a Consumer Group have the same subscription information
If all consumers in the same consumer group subscribe to the same message, the RoundRobin policy allocates partitions evenly.
For example, in the same consumer group, three consumers C0, C1, and C2 subscribe to two topics T0 and T1, and each topic has three partitions (P0, P1, and P2), so all the subscribed partitions can be identified as T0P0, T0P1, T0P2, T1P0, T1P1, and T1P2. The final partition allocation result is as follows:
consumers | Division of consumption |
---|---|
C0 | Consume t0P0, T1P0 partitions |
C1 | Consumption T0P1, T1P1 partition |
C2 | Consume T0P2, T1P2 partitions |
5.2.1 Subscription information of consumers in the Consumer Group is different
If messages subscribed to are not the same within the same consumer group, partition allocation is not a complete polling allocation, which can lead to uneven partition allocation. If a consumer does not subscribe to a topic within the consumer group, the consumer will not be assigned to any partitions of the topic when partitioning is allocated.
For example, there are 3 consumers C0, C1 and C2 in the same consumer group, and they subscribe to 3 topics T0, T1 and T2, which have 1, 2 and 3 partitions respectively (t0 has 1 partition (P0), T1 has 2 partitions (P0 and P1), and T2 has 3 partitions (P0, P1 and P2)). That is, all partitions subscribed by the entire consumer can be identified as T0P0, T1P0, T1P1, T2P0, T2P1, t2P2. Then consumer C0 subscribes to topics T0, consumer C1 subscribes to topics t0 and T1, and consumer C2 subscribes to topics t0, T1, and T2. The final partition allocation results are as follows:
consumers | Division of consumption |
---|---|
C0 | Consume t0P0 partition |
C1 | Consume t1P0 partition |
C2 | Consume T1P1, T2P0, T2P1, T2P2 partitions |
6 Efficient Kafka reads and writes
Kafka supports millions of TPS due to several features.
6.1 Sequential Reading and Writing Data
Information is stored in the hard disk, the hard disk is composed of many disks, the microscope will see the surface of the disk uneven, raised place is magnetized to represent the number 1, concave place is not magnetized to represent the number 0, so the hard disk can be binary to store words, pictures and other information.
Disk plan
Here’s a picture of the actual hard drive, and if you don’t understand the internal structure, let’s take a look at the image:
Internal disk diagram
- System through
head
fromdisk
Reading the data, the head flies over the disk at one-thousandth the diameter of a human hair. - In the hard disk
disc
It looks like a CD. Onedisc
There are two of themdisk
, eachdisk
Can store data. - each
disk
It’s going to be divided into super concentric circlestrack
The radius of the concentric circles is different. - The same track on all disks constitutes one
cylinder
And the sametrack
In the samesector
Referred to ascluster
. Read and write data according tocylinder
It works from top to bottom. When one column is full, it moves to the next sector to write data. - A track is divided into sections
The circular arc
(sector
), eachsector
Used to store 512 bytes and other information. Because the sectors of concentric circles have the same radians but different radii, the velocity of the outer circle is higher than that of the inner circle. - It is inefficient for the system to read one sector at a time, so the operating system is based on
block
To read the data. Oneblock
(block) generally consists of multiple sectors. The size of each block is 4 to 64KB. - page
page
By default, 4KB. Operating systems often communicate with two storage devices, memory and hard disk. Similar to the concept of blocks, both require a virtual base unit. So with memory operations, the concept of a virtual page is used as a minimum unit. When dealing with hard disks, the smallest unit is a block.
- Sector: The smallest read/write unit of a hard disk
- Block/cluster: the minimum unit of the OPERATING system for reading and writing data to disks
- Page: is the smallest unit of operation between memory and operating system.
The process of completing a read/write request for a disk access consists of three actions:
- Seek: the time it takes for the magnetic head to move from the beginning to the track where the data is located, about 10ms on average.
- Rotation delay: The time required for disk rotation to move the requested data sector below the read/write head. Rotation delay depends on disk rotation speed. For 5400 RPM disks, the average is about 5 ms.
- Data transfer: The time it takes for the magnetic head to access all data from the first position in the target sector. If a 5400rpm track has 400 sectors, it takes 0.0278ms for me to access only one.
It can be found that the main reading time is in the first two, if I read sequentially, the seek and rotation delay only need one time. Random reads, on the other hand, can experience multiple seek and spin delays, almost three orders of magnitude different.
Random sequential reads and writes are performed on disk and memory
6.2 Memory Mapped Files
- Virtual memory systems divide Virtual memory into fixed-size chunks called Virtual pages (VP). By default, each Virtual Page is 4KB. Similarly, Physical memory is split into Physical pages (PP), also 4KB.
- The server can directly use the Page of the operating system to map the physical memory to the file. The user’s operation and read data are directly mapped to the Page, and the operating system automatically synchronizes the operation on the physical memory to the hard disk according to the mapping. Achieve similar sequential memory reading and writing functions.
- The disadvantage is also noted when Broker messages fall off disks, falling off disks that are not real can result in data loss.
The memory mapping
6.3 the Zero Copy
6.3.1 Direct memory access DMA
The CPU issues instructions to operate IO to carry out read and write operations. In most cases, data is actually read into memory, and then from memory to IO, so data can actually not pass through the CPU.
Direct Memory Access is designed to speed up batch data input/output. DMA refers to the interface technology for external devices to exchange data directly with system memory without passing through the CPU. The speed of data transmission thus depends on the working speed of memory and peripherals.
If the data is transferred using DMA only and the data is not copied through the CPU, it is called Zero Copy. With Zero Copy technology, the time and performance are at least halved.
6.3.2 Kafka Read/write Comparison
Zero copy
As shown in the black flow, Zero Copy technology is not used:
- DMA transfer, disk read data to the operating system memory Page Cache area.
- Data is copied from the Page Cache area to the user memory area.
- The CPU moves data from user memory to Socket Cache.
- DMA transfer. Data is transferred from Socket Cache to NIC NIC Cache.
The red flow is the flow using Zero Copy technology:
- DMA transfer, disk read data to the operating system memory Page Cache area.
- DMA transfer. Data is transferred from the Page Cache area of system memory to the NIC Cache area.
6.4 Batch Deal
When consumers pull data, Kafka does not send data one by one, but sends it in batches for processing. This saves network traffic and increases the TPS of the system. However, there is a disadvantage that our data is not really processed in real time, and real time depends on Flink.
7 reference
- Why Kafka partitions: www.zhihu.com/question/28…
- On disk read: blog.csdn.net/holybin/art…
- Kafka millions TPS:mp.weixin.qq.com/s/Fb1cW0oN7…
RabbitMQ high-frequency test points
2021-01-27
Path to Hadoop hyperburn
2021-01-25
20 photos take you through the world of HBase
2021-01-20
This section describes the five core knowledge points of Zookeeper
2021-01-15
Brief introduction to 2PC, 3PC, Paxos, Raft, ZAB in Big data
2020-11-03