preface
Kusako recently started learning Kafka. Before learning, determine the scope of learning for yourself, roughly as follows:
- Understand Kafka concepts;
- Basic API usage of Kafka;
- Understand the philosophy behind Kafka.
Some related articles will be published in this area. This article, the first in the Kafka series, will begin with “Understanding the concepts of Kafka.” First of all what Kafka is.
Kafka was originally developed by LinkedIn as a multi-partitioned, multi-replica, zooKeeper-coordinated distributed messaging system in Scala, and has been donated to the Apache Foundation. At present, Kafka has been positioned as a distributed streaming processing platform, which is widely used for its high throughput, persistence, horizontal expansion, support for streaming data processing and other features.
From the above introduction, we can see that Kafka has two roles: a messaging system and a streaming processing platform. To better understand Kafka, this article introduces the messaging system.
The messaging system
Messaging systems, also known as messaging middleware. The term you hear a lot these days is Message queue (MQ), an abbreviation for message-oriented middleware, which means the same thing. So what is a messaging system?
Look at a scene we’re all familiar with — email. Were, in fact, we send E-mail to forward an email files from our computer to each other on a computer, but when we send, does not need to care about each other’s computer is open, it is good to send, and email will be sent to the email server, and then when the other computer, then get the mail from the mail server. A mail server is a messaging system that temporarily stores messages sent by communication between applications. The benefits brought by this are also obvious. As the producer sending the message, it does not need to care about the status of the consumer receiving the message. The producer only needs to ensure that the message is successfully sent to the message system, which is an asynchronous communication mode.
This communication mode acts as a decoupling, reducing the responsibilities of the producer. Producers are concerned only with producing messages and sending messages to the messaging system; consumption is not.
Not only that, but in addition to the point-to-point (single consumer) situation we described above, we can also use the publish/subscribe model without the awareness of the producer. New consumers who join subscribe to topics, which are then broadcast by the messaging system to all subscribing consumers, facilitating the system’s expansion.
And the response of the system can be accelerated through asynchrony. For example, an order operation needs to involve the processing of coupons, points, SMS and other systems. If synchronization is used, it needs to wait for all systems to finish processing, but the response time of the order system will be greatly increased. By using the message system, the order system only needs to write the message of the order operation, then complete the order operation and respond to the user. As for coupons, points, SMS, etc., the corresponding system will obtain the order operation message from the message system for processing.
Let’s think about this scenario. When the single system receives a sudden increase in requests, the message system can also play the role of peak clipping/flow limiting, temporarily storing messages so that the downstream system can process messages according to its own processing capacity, avoiding the crash of the downstream system and making the system more stable.
Issues of concern
Through the above introduction, the message system has a certain understanding. Now let’s think about the functions of the message system further. We already know the functions of the message system. If we use a message system, we need to pay attention to some problems.
- Availability. If the messaging system fails, all downstream systems consumed will fail, so availability needs to be guaranteed.
- Throughput. If the upstream system has a throughput of millions per second, you need to match the write throughput of the messaging system. It is also important to pay attention to throughput of downstream system consumption.
- Message lost. Upstream system -> message system -> downstream system, both processes are likely to have message loss.
- Message order. This problem actually goes hand in hand with message loss, which causes message order disorder.
In the follow-up study, we will take a look at how Kafka solves these problems.
reference
-
In-depth Understanding of Kafka’s core Design and Practice
-
Usage scenarios for message queues
-
What is a message queue
-
Message queue design essentials