Apache Kafka is a distributed streaming platform for building real-time data pipelines and streaming applications. It lets you publish and subscribe to streaming records, can store streaming records, and has good fault tolerance and can process streaming records as they are generated.
Kafka is a distributed publish-subscribe messaging system. Kafka is a distributed publish-subscribe messaging system. Kafka is a distributed publish-subscribe messaging system.
1. High throughput and low latency: Kafka can process hundreds of thousands of messages per second, and its latency is only a few milliseconds. Each topic can be divided into multiple partitions, and consumer groups consume partitions.
2. Scalability: Kafka cluster supports hot scaling;
3. Persistence and reliability: Messages are persisted to local disks, and data backup is supported to prevent data loss.
4. Fault tolerance: Nodes in the cluster are allowed to fail (if the number of replicas is N, n-1 nodes are allowed to fail).
5. High concurrency: thousands of clients can read and write at the same time;
6. Support real-time online processing and offline processing: Messages can be processed in real time by Storm real-time stream processing system and offline processing by Hadoop batch processing system;
1. Log collection: A company can use Kafka to collect logs for various services and open them up to consumers (Hadoop, Hbase, Solr, etc.) as a unified interface service.
2. Message system: decouple producers and consumers, cache messages, etc.
3. User activity tracking: Kafka is often used to record the activities of Web users or app users, such as browsing, searching, clicking, etc. These activities are published by various servers to Kafka topics, which subscribers subscribe to for real-time monitoring and analysis. Or load it into Hadoop or data warehouse for offline analysis and mining;
4. Operational Metrics: Kafka is also used to record operational monitoring data. This includes collecting data from various distributed applications and producing centralized feedback on various operations, such as alarms and reports;
5. Streaming: Spark Streaming and Storm;
6. Event source;
1. Install and configure Kafka and Zookeeper
Installation and configuration process is very simple, can’t, refer to the website: http://kafka.apache.org/quickstart
Start Kafka: bin /kafka-server-start. Sh config /server. Properties
Here is my environment:
2. Create the Spring Boot project
Note version: This project uses Spring Boot 2.0 +, the earlier version may not be correct
1. Pom. XML references
1. Define message producers
KafkaTemplate is used to send messages. Spring Boot is automatically assembled. There is no need to define a Kafka configuration class. Write a ProduceConfig Consumerconfig class, Kafka parameters are hard coded in the code, it is impossible to look at.
Define a generic class KafkaSenderTT that is the message object you need to send, serialized using Ali’s FastJSON
After the message is sent, you can process its own services in the callback class. The ListenableFutureCallback class has two methods, onFailureon and onSuccess. In actual scenarios, you can use these two methods to process its own services.
1. Define message consumers
Use the @kafkalistener annotation to listen for topics messages that must be the same as those in the send function
@header (KafkaHeaders.RECEIVED_TOPI Get topic directly
1. Configure file application.yml
1. Use @autowired directly on KafkaSender
Then call the send method to send the message. Here is the code:
The console can see that the execution is successful:
Run bin/kafka-topics. Sh –list –zookeeper localhost:2181 on the server to see topic
1. No loss of producer data
· The new version of Producer adopts asynchronous sending mechanism. The KafKaProducer.Send (ProducerRecord) method simply puts this message into a buffer (RecordAccumulator), and the IO thread in the background is constantly scanning the buffer. The messages that meet the criteria are encapsulated into a batch and sent out. Obviously, there is a data loss window: if the client hangs before the IO thread sends, the data accumulated in the Accumulator may indeed be lost. Kafka ack mechanism: When Kafka sends data, every time it sends a message, there is an acknowledgement mechanism to ensure that the message is properly received.
· In synchronous mode: The ACK mechanism can ensure that data is not lost. If the ACK is set to 0, the risk is very high. Therefore, it is generally not recommended to set 0
producer.type=sync
request.required.acks=1
· In asynchronous mode: Buffer is used to control the sending of data. There are two values for control: time threshold and number of messages threshold. If buffer is full and data is not sent out, it is very risky to set immediate clearing mode, so it must be set to block mode
producer.type=async
request.required.acks=1
queue.buffering.max.ms=5000
queue.buffering.max.messages=10000
queue.enqueue.timeout.ms = -1
batch.num.messages=200
· Conclusion: The producer may lose data, but the information can not be lost through configuration
2. No loss of consumer data
· If offset is submitted before message processing is complete, data loss may result. Since Kafka Consumer automatically commits shifts by default, it is important to ensure that the message is properly processed before committing shifts in the background. Therefore, heavy processing logic is not recommended. If the processing takes a long time, it is recommended to put the logic in another thread. In order to avoid data loss, two suggestions are given:
Enable.auto.mit =false Disables the automatic commit shift
Manually commit the shift after the message has been fully processed
· If storm is used, the ackfail mechanism of Storm should be enabled;
· If storm is not used, update offset after data processing is complete. You need to manually control the offset value in low-level apis. The offset commit is used to ensure that data is not lost. Kafka records the offset for each consumption, and the next time the consumption continues, the last offset is consumed.