Over the past few years, Apache Kafka’s capabilities and coverage have greatly improved. Kafka is used by a third of the top 500 companies, including seven of the world’s Top10 banks, eight of the Top10 insurance companies, and nine of the Top10 telecommunications companies in the United States.
First, let’s take a look at two core features Kafka provides:
(1) Message system
Messages are widely used in two ways:
-
The queue
The consumers of the queue are a group of workers, and each message is processed by only one worker among them, efficiently allocating the work. Queues are great for fault tolerance and scaling.
-
Publish/subscribe
The subscribers are independent of each other, and each subscriber gets every message, like a broadcast system.
Publish/subscribe makes it easy to decouple the system.
(2) Stream processing
With a robust, extensible messaging system, it would be nice to be able to easily process message data flows. Kafka’s Stream API provides this functionality. It is a Java client library that provides a higher level of abstraction than producer and consumer.
The Stream API can be easily implemented:
-
Stateless operations, such as filtering and transforming information flows
-
Stateful operations, such as join and aggregate operations within a time window
Sample code for stream processing
Read the text from the input stream, count each word, and write the count to the output stream
A typical usage scenario for Kafka
-
For a travel website, for example, hotel and airfare prices change all the time. Some components of the system (for example, price alerts, analytics) need to be notified of these changes. Changes can be sent to the Kafka Topic, and the component that needs them is notified as a subscriber.
-
You can use Kafka to track and analyze web behavior (page views, searches, etc.). In fact, this is how Kafka started. LinkedIn developed Kafka to do this. Different types of behavioral data are sent to different topics for real-time analysis to obtain valuable data such as user participation and page access path, so as to support the operation strategy of the website.
-
Let’s say you have a lot of location data that needs to be processed in real time to track vehicle paths, distances and so on. Location data sent from a device can be put into Kafka and processed using the Stream API, for example, to extract location data from a user over a specified period of time.
When not to use Kafka?
-
For services that need to communicate with a Kafka cluster, if you cannot or do not want to use Java/Scala, it is recommended not to use Kafka, otherwise you will not be able to use the Stream API.
-
If you only need one task queue, consider using RabbitMQ.
-
If you only need to process a few thousand messages a day, Kafka is a bit heavy. Kafka is suitable for large scale stream processing. Kafka is expensive to set up and maintain, so it’s not worth it if it’s small.
Translation from
https://medium.freecodecamp.org/how-to-know-if-apache-kafka-is-right-for-you-1b2e468d52b9
Click “Read original” below for a list of articles