Today, I will share with you a carefully organized “Kafka interview questions summary” by Zhu Xiaozhu, including basic, advanced, advanced three parts. The questions are small, but they are the most frequently asked questions during the interview.
If you can handle all the questions in this document, you will be able to handle them when interviewing Kafka. If you have trouble answering any of the above, it’s time to learn Kafka!
See the end of the article for the way of receiving!!
See the end of the article for the way of receiving!!
Kafka base paper
1. What is Kafka used for? What are the usage scenarios?
Messaging systems: Both Kafka and traditional messaging systems (also known as message-oriented middleware) have features such as system decoupling, redundant storage, traffic peaking, buffering, asynchronous communication, scalability, and recoverability. At the same time, Kafka also provides sequentially guaranteed messages and traceability that is difficult for most messaging systems.
Storage systems: Kafka persists messages to disk, effectively reducing the risk of data loss compared to other memory-based systems. It is also thanks to Kafka message persistence and multi-copy mechanism, we can use Kafka as a long-term data storage system, just need to set the corresponding data retention policy to “permanent” or enable the topic log compression function.
Streaming platform: Kafka not only provides a reliable source of data for every popular streaming framework, but also provides a complete library of streaming classes for operations such as Windows, connections, transforms, and aggregations.
3. What do HW, LEO, LSO, LW, etc. stand for in Kafka?
HW is short for High Watermark, commonly known as High Watermark. It identifies a specific message offset. Consumers can only pull messages before this offset.
LSO is LogStartOffset. In general, the starting offset of the log file LogStartOffset is equal to baseOffset of the first log segment, but this is not absolute. The value of logStartOffset can be requested via DeleteRecordsRequest (for example, using KafkaAdminClient) The deleteRecords() method, the kafka-delete-records.sh script, and the log cleanup and truncation operations.
As shown in the figure above, it represents a log file with nine messages. The first message has an offset (LogStartOffset) of 0 and the last message has an offset of 8. The offset 9 message is represented by a dotted box and represents the next message to be written. The log file has an HW of 6, which means that the consumer can only pull messages with an offset between 0 and 5. Messages with offset of 6 are not visible to the consumer.
LEO is the abbreviation of Log End Offset, which identifies the Offset of the next message to be written in the current Log file. In the figure above, the position Offset of 9 is the LEO of the current Log file. The size of LEO is equal to the Offset value of the last message in the current Log partition plus 1. Each copy in the partitioned ISR set maintains its own LEO, and the smallest LEO in the ISR set is the PARTITION’s HW, so consumers can only consume messages before the HW.
LW is short for Low Watermark, commonly known as “Low Watermark”, and represents the smallest logStartOffset value in the AR set. Fetch requests (fetchRequests, which can trigger new log segments while old ones are cleaned up, thereby increasing the logStartOffset) and deleterecordrequests (deleterecordrequests) can both contribute to LW growth.
.
Kakfa advanced
1. What are the current internal Kafka topics and what characteristics do they have? What’s their role?
Consumer_offsets: This is used to hold the displacement information of a Kafka consumer
Transaction_state: Used to store transaction log messages
2. What is a priority copy? What special function does it have?
A preferred copy is the first copy in the list of AR sets.
Ideally, the preferred copy is the leader copy of the partition, so it is also called a preferred leader.
Kafka ensures that the preferred copies of all topics are evenly distributed across the Kafka cluster. This ensures that the leaders of all partitions are evenly distributed. This facilitates load balancing of the cluster, also known as “partition balancing.”
.
Kafka, a senior post
1. How are transactions implemented in Kafka?
3. Evolution process of HW and LEO in each copy under multiple copies
.
Bookmark this article so you can review it before applying for a job.
[data to get here!!]
Note: Data source network, intrusion deletion.