Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”
This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
This paper is participating in thePerformance optimization combat record”Topic essay activity
Like it and see. Make it a habit.
This article is featured in Github JavaExpert, which includes my series of articles, interview question bank, self-study materials, e-books, etc.
preface
Hello, everyone. I’m One.
Today, I want to share a question that my fans met during meituan’s second interview — how to design a million people lottery system?
Mind mapping
Recently, when I talked with bydidi and other experts about how to better transfer knowledge to fans in the communication group, we all agreed that mind mapping is conducive to the construction of knowledge network, and the feedback from fans also like mind mapping, so the following articles will try to use mind mapping.
Source map file: github.com/lbsys/JavaE…
The architecture is never designed, but evolved
From a lottery system of a few hundred people to tens of thousands to millions, new things are added.
Finally summarize a set of design ideas, is also a universal template, so that interviewers ask any high concurrency system, just from these directions to consider it.
[toc]
V0 — Monomer architecture
If now let you realize dozens of people lottery system, simple dead, direct heavy fist attack!
Two cats and a porpoise walking in all corners of the country, the lottery into the library, adjust notice service, check the library notice, perfect!
I believe you may have done this kind of case when learning Java, thinking about 🤔 what problems exist?
- Single service, one careless game is lost
- Draw again and again, and a man is an army
- Malicious script, no programmer can’t win the prize
How do you solve these problems?
V1 — Load balancing
When a server has more traffic per unit of time, the greater the pressure on the server will be. When the pressure exceeds its capacity, the server will crash.
In order to avoid server crashes and provide users with better experience, load balancing is used to share server load.
Load balancing is the establishment of many servers, forming a server cluster, when the user visits the website, first visit a middle server, like the housekeeper, by him in the server cluster to choose a less pressure server, and then the access request to the server.
In this way, each user’s access ensures that the load on each server in the cluster is balanced and the load on each server is shared, preventing server crashes.
Load balancing is implemented using the principle of “reverse proxy”. The specific load balancing algorithm and its implementation will be continued below.
Although load balancing solves the problem that a single architecture loses all games carelessly, the server cost still cannot protect the system comprehensively. We must think well about how to ensure user experience once the server goes down.
How to alleviate the flood of requests at the moment of drawing.
V2 — Service traffic limiting
Traffic limiting mainly protects service nodes or data nodes at the back of the cluster from instantaneous traffic overload and data crash (such as a large number of front-end cache failures), resulting in unavailability.
It can also be used to smooth requests.
In the previous section, we have done load balancing to ensure the availability of the cluster. However, the company needs to consider the cost of servers. It is impossible to increase the number of servers without limit.
The point of limiting traffic is that we can’t predict unknown traffic, such as the one we might encounter in the aforementioned lottery:
- Repeat draw
- A malicious script
Some other scenarios:
- Hot Events (Weibo)
- A large number of the crawler
These situations are unpredictable, do not know when there will be 10 times or even 20 times the flow in, if really hit this situation, capacity expansion is simply too late (elastic capacity expansion is empty talk, a second you give me expand to try)
Clear the meaning of limiting the flow, let’s see how to achieve limiting the flow
Prevent users from repeating the lottery
Duplicate sweepstakes and malicious scripts can be lumped together, while hundreds of thousands of users may make millions of requests.
If the same user sends multiple requests for lottery within 1 minute, it is considered as malicious repeated lottery or script flushing. Such traffic should not continue to request and is directly shielded in the load balancing layer.
You can configure IP access frequency through Nginx or configure traffic limiting policies at the gateway layer in conjunction with Sentinel.
The user’s sweepstakes status can be stored via Redis, as explained later.
Intercepting invalid traffic
Prizes and items are limited, both in the raffle and the second kill, so the flood of requests that come in after them is virtually useless.
For example, suppose that 500 thousand people draw a lottery and prepare 100 mobile phones, then 500 thousand requests rush in instantly. In fact, the first 500 requests snatch up the mobile phone, and the subsequent hundreds of thousands of requests do not need to let him execute the business logic, but directly intercept and return to the end of the lottery.
At the same time the front end on the button gray can also do some articles.
So think about how to know when the prize is out, i.e. the data synchronization between inventory and order.
Service degradation and service fuse
Is this a safe bet? No way. So there are downgrades and circuit breakers on the server side.
In this simple to do a supplement, detailed content please continue to pay attention to the author.
Many people tend to confuse these two concepts. Let me give you a quick example:
Let’s say the number of fans exceeds 1 million and it hits weibo hot search. Both fan A and fan B open weibo to watch, but one sees a press conference, while the other sees “the system is busy”. After a while, the other can also see the content.
(Please allow a fantasy 😎)
In the above process, first of all, the hot time causes a large number of requests, and the service fuse occurs. In order to ensure the availability of the whole system, part of user B is sacrificed. The “busy system” that user B sees is the service fallback, and the access will resume after a while, which is also a feature of the fuse.
V3 Synchronization Status
So let’s go back to the question we had in the last video, how do we synchronize lottery states?
It has to be mentioned that Redis is widely used as a cache database for highly concurrent systems.
We can implement this shared sweepstakes state based on Redis, which is very lightweight and suitable for shared access on two levels of systems.
Of course, in fact, ZooKeeper can also be used. In the load balancing layer, zK client can monitor the status of a ZNode. Once the sweepstakes end, the sweepstakes service updates the ZK state, which is sensed by the load balancing layer.
V4 thread optimization
For an online environment, the number of worker threads is a critical parameter that needs to be adjusted to suit your situation.
As is known to all, each request entering Tomcat is actually handed over to an independent worker thread for processing, so the number of threads in Tomcat determines the ability to process concurrent requests.
However, the number of threads needs to be determined by pressure measurement, because each thread will handle a request, and this request needs to access the external system such as the database, so not every system parameters can be the same, you need to pressure the system.
But as a rule of thumb, the number of Tomcat threads should not be too high. Because of too many threads, the CPU of the ordinary server is not able to carry, but will lead to the machine CPU overload, and eventually crash.
At the same time, the number of Tomcat threads should not be too small, because 100 threads will not make full use of Tomcat thread resources and CPU resources of the machine.
Generally speaking, the number of Tomcat threads is between 200 and 500, but the exact number needs to be measured and adjusted continuously to see the specific CPU load and the efficiency of the thread to execute the request.
Increase the number of threads as much as possible while CPU load is fair and request execution performance is healthy.
However, if the machine load is too high, and the speed of processing requests starts to decline, the machine can not support so many threads to concurrently execute processing requests, at this point, the number of threads can not continue to increase.
V5 Business Logic
How does the lottery logic work?
Okay, now it’s time to figure out how to do the raffle
At the load balancing level, 480,000 traffic out of, say, 500,000 traffic has been blocked, but 20,000 traffic may still enter the lottery service.
Because the lottery activities are temporary services, ali Cloud can rent a bunch of machines, it is not very expensive, Tomcat optimization is finished, the server problem has been solved, what is left?
Mysql, yes, can your Mysql withstand 20,000 concurrent requests?
The answer is hard. What do we do?
It’s easy to replace Mysql with Redis and have 20,000 concurrent servers.
And redis has a data structure called SET that is ideal for lotteries, where an element can be randomly selected and eliminated.
V6 flow peak clipping
From top to bottom, the remaining winning notice part is not optimized.
Consider this: If the raffle service wins a prize 10,000 out of 20,000 requests, it will result in 10,000 calls from the raffle service to the gift service.
Does that have to be treated like a lottery service?
In fact, it does not need to be used, because sending notifications does not require timeliness, but can allow 10,000 requests to be sent slowly. In this case, message middleware is used for traffic limiting and peak clipping.
That is, the lottery service sends the winning message to MQ and notifies the service to slowly consume the winning message from MQ and finally complete the distribution of the gift, which is why we have some delay in receiving the winning message or logistics information.
Assuming that two notification service instances can send 100 notifications per second, 10,000 messages is a 100-second delay.
The pressure on MySQL will also be reduced, so the database level can also withstand.
Take a look at the final structure:
Answer the questions on the template
The so-called answer template is a few thinking directions and solutions for high concurrency problems.
Single responsibility
A basic design idea, recall the high school physics series and parallel, series one out all out, parallel each have a path.
Same idea, high cohesion, low coupling.
The rise of microservices is due to the fragmentation of complex functions, so that even if a website crashes and orders cannot be placed, the browsing function remains healthy, rather than causing a chain reaction of all services to collapse like an avalanche.
URL Dynamic Encryption
This is to prevent malicious access, some crawler or brush script will cause a large number of requests to access your interface, you do not know what parameters he will pass to you, so we must be more verified when defining the interface, because not only your friends adjust your interface, the enemy may also.
Static resource — CDN
Content distribution network (CDN), as the full name of CDN, is a distributed network composed of edge node server clusters distributed in different regions, which is established and covered on the bearer network.
In layman’s terms, this means putting frequently accessed and time-consuming resources on a server near you.
Taobao’s picture access, 98% of the flow has gone CDN cache. Only 2% are returned to the source site, saving a lot of server resources.
However, if the picture content changes in large quantities during the peak period of user access, the access of a large number of users will penetrate the CDN and cause great pressure to the source station.
Therefore, for static resources such as pictures, as far as possible into the CDN.
Service current limiting
As explained above, it can be divided into front-end and back-end current limiting.
- Front end: Button disabled, IP blacklist
- Back-end: service fuses, service degradation, and permission verification
Data preheating
You can query druids in real time using elastice-job and put hot data into redis cache.
Consider a question:
For example, now there is only one inventory left, we have high concurrency, 4 servers together found that there is still one, then everyone thought that they grabbed, they all go to deduct the inventory, the result becomes -3, yes, only one is really grabbed, the others are oversold. Do how?
Answer:
This can be done with CAS+LUA scripts.
Lua scripts are similar to Redis transactions, with a certain atomicity, can not be queued by other commands, and can perform some Redis transactional operations. This is the key.
Write a script to check the inventory deduction and inventory deduction operations in a script and send it to Redis to do, then Return the following False, one failed you modify a switch, directly block all requests.
Peak peel
Mastering a piece of middleware will give you a lot of points
Message queue has gradually become the core means of internal communication in enterprise IT systems.
It has a series of functions, such as low coupling, reliable delivery, broadcast, flow control and final consistency, and has become one of the main means of asynchronous RPC.
There are many mainstream messaging middleware in the market today, such as the old ActiveMQ, RabbitMQ, the hot Kafka, RocketMQ independently developed by Alibaba, etc.
In the original MQ, a producer would post a message to a container called a “queue,” take it out of the container, and forward it to the consumer, and that’s it.
More on MQ below.
Today to learn so much, I believe that we have a preliminary understanding of the high concurrency system, the interviewer asked will not have nothing to say, but want to learn a heavy and long way, I hope you pay attention to one, take you to study together!