What is kafka
You can use it as a message queue, either in point-to-point or publish-subscribe mode.
You can also use it as a messaging engine to manage the flow of information.
In short, it is a vehicle for delivering messages.
So, the first thing we need to know is:
- How it sends messages.
- What are its advantages?
- How to use it, how to code it.
- What should be paid attention to in the process of using.
Let’s take a look.
What is a message queue
Message queuing is an asynchronous inter-service communication mode suitable for serverless and microservice architectures. Messages are stored on queues until they are processed and deleted. Each message can only be processed once by one user. Message queues can be used to separate heavy processing, buffering, or batch work, and to ease peak workloads.
As long as the message is temporarily stored in a queue and then written and read at the same time, it can be considered a message queue. On the writing side we become producers, on the reading side we become consumers.
Message queues have two modes: point-to-point (one-to-one) and publish-subscribe (one-to-many or many-to-many).
- Peer-to-peer: Like a phone call, one-to-one, where no one else can hear.
- Subscribe to publish: Like subscribing to a newspaper, you can have more than one producer sending information to a topic, or you can have more than one consumer subscribing to a topic to get information.
Why use message queues
- Peak peel
This is probably the most used case. When the upstream traffic is too heavy, you can either discard the traffic or store it to avoid being overwhelmed. Obviously, saving is a better choice most of the time, and notice that this is most of the time, sometimes it’s better to throw away some time-sensitive scenarios.
To store the flow and consume it slowly in idle time, buffer the instantaneous burst flow of upstream and downstream, make it smoother and increase the availability of the system, which is peak cutting and valley filling.
- The decoupling
Usually our code exists in the call chain, if it is a single unit, directly call the function, waiting for the return of good. But in distributed, microservices, we may not know upstream and downstream changes or reliability.
For example, service A generates an event, such as registration, and invokes three BCD services, such as sending SMS messages, initializing user cache, and so on. So here’s the question:
- If we need to add E or remove C, we need to modify A’s code.
- If B dies, does ACD need to wait for it to restart? If not, how to send the message?
- If D is processed slowly, does ABC need to wait for it?
If we use the publishave-subscribe model and decouple them, A only publishes messages, BCD subscribes messages, and E only adds subscriptions. A does not care about the speed or existence of the consumers, the message queue is responsible for the storage of messages, and they can wait for their own restart to consume.
- asynchronous
Much like the decoupling case, we can change the sequential execution of ABCD to the asynchronous consumption mode in which A returns after sending A message, so that A does not have to be slowed down by unimportant things.
Disadvantages of message queues
There is no silver bullet in the world of programming.
- Usability is reduced initially. Since joining a new system is bound to bring instability, especially when message queues often become the core of the system, the problems caused by the collapse are huge. But once built, the overall availability will increase.
- Complexity increases. Message loss, repeated consumption, orderliness, and so on need to be addressed.
- Consistency issues. Also known as idempotent, the idea is that consumers must consume information produced by producers only once, and if they consume it many times, they should consume it as if they had consumed it only once.
There are three major challenges to message queuing:
- Message repeat consumption. That is, a message should be consumed only once.
- Message lost. That is, messages should be received and consumed.
- Sequential consumption of messages. That is, messages should be consumed sequentially.
These three will be dealt with separately in later chapters, because each one is difficult.
There’s a simpler problem: message backlogs.
Message queues also add a pool, like the Three Gorges Dam, which has an upper limit. When it becomes overwhelmed, we become a message backlog. The solution is relatively simple:
- Urgent expansion of consumers, speed up consumption.
- Expired message discard.
- Prioritizing consumption is important, depending on the urgency of the message.
The two models of messages are Pull vs. Push
The Pull model is the consumer pulling, and the advantage of this is that the consumer can determine their own speed. The disadvantage is that consumers need to rotate there is no new news, if consumers are too slow, may be back pressure upstream.
Push mode is more brutal, like texting, direct message queue Push, regardless of the consumer’s life. This approach obviously leads to message loss, but the benefit is that message queue stress is low.
Does anyone still write message queues in Redis?
The answer is yes. There are many applications for message queuing on the market, but the simplest is Redis.
Let’s start with the benefits:
- Simple, no doubt about it.
- Availability, Redis starts with what you’re going to use, and it needs to be available so that we don’t have to go through another instability to introduce a new system.
That’s enough to say, in fact, it’s an option when there’s not a lot of business.
But its disadvantages are also deadly:
- If the message is large, Redis memory explodes.
- If the consumer is slow, or the downstream fails, Redis’s memory will skyrocket.
- Once the number of messages is in the billions, do you have enough money in your wallet?
- If Redis fails and restarts, the message will be lost.
- Can’t play back, can’t restore the information flow scene.
Kafka appearance
Since Redis is unreliable, we need to choose a reliable one. Kafka is of course only an option, and RabbitMQ next door is just as good.
Let’s take a look at its advantages:
- Kafka stores messages on hard disk for appending and zero copy to improve efficiency. The upside is that it’s cheap, efficient, reliable, and can play back indefinitely. (Great for massive message storage and analysis, hard drives are much cheaper than memory.)
- The order of messages on a single partition is guaranteed.
- With message partitioning and distributed consumption, scalability is easy, while multiple replicas ensure availability.
- With message compression function, greatly saving bandwidth. (The bottleneck of message queues is definitely bandwidth, not hard disk, CPU, or memory.)
Usage scenarios for Kafka
- Activity tracking: Kafka can be used to track user behavior, for example, we often go back to the taobao shopping, the moment you open taobao, your login information, log in as a message transmission to Kafka, when you browse shopping, your browsing information, your search index, your shopping hobbies will be as one message to Kafka, So you can generate reports, you can do smart recommendations, buy preferences, etc.
- Message delivery: Another basic use of Kafka is message delivery. Applications send notifications to users by message delivery. These application components can generate messages regardless of the format of the message or how the message is sent.
- Metrics: Kafka is also used to record operational monitoring data. This includes collecting data for various distributed applications and producing centralized feedback for various operations, such as alarms and reports.
- Logging: The basic concept of Kafka comes from commit logging. For example, we can send database updates to Kafka to record the update time of the database. Kafka provides a unified interface service for consumers such as Hadoop, Hbase, and Solr.
- Stream processing: such as PV, exposure, click, behavioral events, big data analysis (PV \ UV, funnel analysis, sun/moon active, sun/moon persistent).
- Load limiting: Kafka is mainly used in the Internet domain when there are too many requests at one time. Kafka can write requests to Kafka to avoid direct requests to the back-end application, which may cause service crashes.
conclusion
Ok, so much for now, the next chapter starts to understand the basic concept of Kafka. Please give me a thumbs up if you like. Your support is my biggest motivation.