Click “like” to see, form a habit, wechat search a search [three prince Aobing] to pay attention to the Internet tools of survival.
This article has been included in GitHub github.com/JavaFamily, there are a line of large factory interview complete test sites, information and my series of articles.
background
Before I wrote a article but there are some seconds kill system flaw, so I prepared before the second on the basis of the creation, but let me resolve two gen seconds kill system because I recently interviewed a lot of readers, is prone to kill system whole blinded by me, my main is meng seconds kill everybody don’t know the details of the system, Don’t even know the components of an e-commerce company’s seckill system.
Before I do in some electricity company is electric business activities, so such a scenario and a lot of solution is more clear to me, then I’ll go with you from my own a seconds kill design details and the pros and cons of various solutions, among the following is my seconds kill system design, on market covers almost all seconds kill the implementation details of:
The body of the
Before we design a system, we need to confirm what our business scenario looks like, so I’m going to take you through a hypothetical scenario.
We need to sell 1000 pieces of the following diapers on site, and according to our previous data and experience of such a quick killing activity, it is visually estimated that there are fully 100,000 people to grab the 100 pieces of diapers. (Antarctic people strike money!)
You a listen to, over ah, this our server where top live! Seriously hit DB is sure to die, but don’t worry, there is a warm male Ao bing in any system we start to design before we should think about what problems will occur? Here are a few classic questions:
The problem
High concurrency:
Yeah, high concurrency this is a point that we don’t even think about, all of a sudden so many people come in this is not high concurrency when is it?
Right? The characteristic of Seckill is that it has a very short time and a large number of instant users.
Normal store marketing is to use very low prices with SMS, APP precise push, attract a lot of users to participate in the second kill, cool business development.
We all know that if the marketing is in place, the price is attractive, and hundreds of thousands of traffic, I think it is not a problem at all. The single Redis, I feel that 3-4W QPS can still stand up, but there is no way to do it again, then this data may not be more than just a hot commodity.
A large number of requests come in, we need to consider a lot of points, cache avalanche, cache breakdown, cache penetration these points I mentioned before are possible to happen, there is a problem hit DB that is very uncomfortable, failure of the activity user experience is poor, activity popularity is lost, and finally the blame or development.
Oversold:
If it is a second kill, they are afraid of overselling. I just take diapers as an example. If it is 100 MacBook Pro, the business budget can make some money by selling 100 MacBook Pro, but you can also build up a reputation. (Nothing to see aobing’s article directly not afraid)
That finally can only kill a development sacrifice to heaven to relieve gas, the price of the second kill was low, basically is not how to make money, oversold terror ah, so oversold is also a very key point.
Malicious request:
Your price is so low, if I get it, I’ll resell it and I’m not bleeding for it? Even if I do not sell I also do not lose ah, that user knows, you know, other people with ulterior motives (hackers, scalpers…) I’m sure I know.
That simple ah, I know when you rob, I make dozens of machines to make some scripts, I also simulate the request of tens of thousands of people, that I do not mean I basically have 80% success rate.
The real situation may be far more than that, because the speed of the machine request is often faster than the hand speed of people too much, in Guizhou aobing EVERY year I go home to grab high-speed rail tickets are second light, I also don’t know if there is no scalper credit, I want to Diss you, scalper. Jay concert tickets can not get, I Diss you.
The ticketing system of scalpers is more than that of many small companies in China. The architecture design is top notch. I use the top matching service and the top matching architecture design. Still want to go home?
But I can’t go home without scalpers. We have too many children in Yunguichuan to go home for the Spring Festival like me!
Link exposure:
The first few questions we may be very well understood, a look at this some partners may be more confused, what is link exposure ah?
Believe is a development of students on the picture is not strange, know some lines of seed can open the Google developer mode, and then take a look at your web page code, some have a URL, but when I write the VUE is event trigger and then to call the inside of the file interface to see the source code can’t see, but I can click to view your request address, However, you seem to be able to grey the button before the seckill.
No matter how it looks, there is danger, except for all the things outside you block, you sell this thing is too cheap, tempting, can you guarantee that the development will not be tempted? Development know the address, when the seconds kill their own advance request… (Development: It’s me again!)
Database:
Every second tens of thousands or even tens of thousands of QPS (requests per second) directly hit the database, basically to put the library hit hang, and your service is not only to do second kill also involves other business, you did not do downgrade, limit current, fusing what, other hang together, small company words may total station crash 404.
No matter how you kill it, you don’t want to kill anything else, right? It’s not like killing a programmer can fix it.
Programmer: I TM so difficult ah!
The problems are listed, so how to design, how to solve these problems is the next thing to think about, we can apply the right medicine.
I will introduce to you from top to bottom what our normal e-commerce seckill system does in each layer, the problems and difficulties of each layer, etc.
We’ll start with the front end:
The front end
Second kill system is generally mall web page, H5, APP, small program these items.
There are a lot of things you can do on the front end, and if you do it with Node, you can get rid of the whole thing, but node really belongs on the back end, so I won’t talk about Node services.
Static resource:
Seconds kill are generally specific goods, and the page template, now generally is the separation of front and back side, the page is generally not through the back end, but the front also to your own server, then put in advance in the CDN server things are put in, anyway, all can improve the efficiency of the steps are done once, reduce the pressure really kill time server.
Second kill link with salt:
We said that if the link is exposed in advance may be a direct access to the URL in advance seconds killed, then there is a small partner to say I do a time check ah, then I tell you, know the address of the link than the page manual click or have a great advantage.
I know the URL, that I continue to obtain the latest Beijing time through the program, can reach the millisecond level, I will request in 00 milliseconds, I dare to say that the absolute success rate is much larger than your artificial point, and I can send N requests for a millisecond, maybe you sell 100 products I took all.
How can this be avoided?
Simple, make URL dynamic, even the people who write the code don’t know, you encrypt random strings to make URL through digest algorithm like MD5, and then through the front-end code to get URL background verification can pass.
This can only prevent a part of the patience to continue to crack hackers, patient people study out or can crack, there are a lot of such wool in the market scene, how to do it?
I’ll talk about it later.
Current limit:
Traffic limiting here I think it should be divided into front-end traffic limiting and back-end traffic limiting.
Physical control:
Have you found that before the second kill, the general button is gray, only the time to click.
This is because of the fear that everyone in the last few seconds of the time approaching the frantic request server, and then not to the second to kill when the server basically hung.
At this time, you need to cooperate with the front end, and regularly request your back-end server to obtain the latest Beijing time, and then give the button available state to the time point.
The button can be clicked and then it has to be dimmed for a few seconds, otherwise it will keep clicking after the start.
Are you telling me it wasn’t like this when you killed?
Front-end flow limit: this is very simple, generally second kill will not let you have been the point, generally is a click or two and then a few seconds later can continue to click, this is also a means to protect the server.
Back end flow limiting: The second kill definitely involves the subsequent order generation and payment operations, but only the lucky ones will get to that stage. Once 100 products are sold out, return a false, and the front end ends the second kill directly, and then your back end also closes the intervention of subsequent invalid requests.
Tip: True stream limiting also involves the addition of limited stream components such as Sentinel, Hystrix, etc. I’m not going to expand it here, but I’m going to talk about physical limiting.
We sell 1000 items, the request has 10 w, we don’t need to put the thousands have come in, you can put 1 w requests come in, and then, because the seconds kill itself is black box for the users, so what do you do they are not perceived, as to why put 1 w to come in, instead of just 1000, because users will lose some while wool, As for how to judge, I’ll talk about risk control later.
Nginx:
Nginx we must not be unfamiliar with it, this thing is a high-performance Web server, concurrent also casually top tens of thousands is not a dream, but our Tomcat can only top hundreds of concurrent ah, that simple ah load balancing, a service hundreds of, then make more points, when the second kill more rent point flow machine.
As far as I know, a big factory in China rented out all the servers in Asia during the Spring Festival last year. Small companies also like to buy traffic machines during the Double 11 to withstand the pressure.
Such a contrast is not feel your cluster can top a lot.
Malicious request interception also need to use it, generally a single user request number is too exaggerated, unlike the artificial request in the gateway layer will have to intercept, otherwise the request he robbed not robbed is one thing, the server pressure up, may occupy the network bandwidth or the server collapse, cache breakdown and so on.
Risk control
I can tell you clearly that all the previous measures will not stop many wool followers, because they are professional teams, they can register many accounts to collect your wool, and do not need machine request, just use group control, operation is almost exactly like real users.
So what do we do? Is there no solution?
Requires risk control students involved in this time, before the request arrives at the back end, risk control can behavioral analysis according to account the account is the probability of the big robots, I now responsible for the company’s specific system, each user’s behavior is big will be delivered to our team analysis processing, data corresponding labels for you.
That hacker also has method actually: raise number
They go to the black market to buy real users have a lot of records of the account, bought not idle, to help them to go shopping, so that the system can not identify them is black or real user number.
How to do?
Lead to kill! Yes, there is no way, can only kill, kill means, we through the wind pipe analysis of the probability of this user is a real user is not as high as the probability of other users, then consider him a machine, discard his request.
We put 10000 requests in the previous flow limit, but our real inventory was only 1000, so we counted 1000 people who are most likely to be real users for seckilling, and discarded other requests, because seckilling is originally a black box operation, the user level is not perceptual, so it can make real users buy things. It also reduces the chance that you’ll be fleeced.
Risk control can be said to be the last threshold for traffic entry, so many companies have strong risk control. If you know about ant Financial’s risk control, you will know that there is a reason why they can make full compensation for your money stolen from Alipay.
The back-end
Single service responsibilities:
To design a system that can withstand high concurrency, I think there is still a single responsibility.
What does that mean? Everybody knows that design today is all about microservices, and then distributed deployment.
That is, we have an order service, user login management and so on, there is a user service, so why don’t we also open a service for seconds to kill, we put the code business logic together.
The advantage of a single responsibility is that if it fails, if it crashes, if it fails, it doesn’t affect other services. (High availability)
Redis cluster:
Before not said that the stand-alone Redis can not stand, it is simple to find a few brothers ah, seconds kill is originally read and write less, then you are not immediately remember I mentioned with you before, Redis cluster, master and slave synchronization, read and write separation, we also make some sentry, open persistent direct invincible high availability!
Inventory preheating:
The essence of the second kill, is to grab the inventory, each second kill users to you go to the database query inventory check inventory, and then deduct inventory, put aside the performance factor, you don’t feel so tedious, are not friendly to business developers, and the database can not hold ah.
Development: You think about me for once.
So what?
We all know databases don’t work but their non-relational cousin Redis does!
That is not easy, we need to start the second kill before you through the scheduled task or operation and maintenance students to load the inventory of goods into Redis in advance, so that the whole process is done in Redis, and then wait for the introduction of the second kill, and then asynchronously to modify the inventory.
However, there is a problem with using Redis. We said above that we use the master-slave method, that is, we will read the inventory and then judge and then reduce the inventory when there is inventory. In normal cases, there is no problem, but in the case of high concurrency, there is a big problem.
** more products several times!! ** For example, now there is only 1 inventory, we are high concurrency, 4 servers query together and found that there is still 1, then we all feel that they grabbed, they all go to buckle inventory, then the result becomes -3, yes only one is really grabbed, the others are oversold. Do how?
Transactions:
Redis itself supports transactions, and he has a lot of atomic commands, you can also use LUA, you can also use his pipe, optimistic lock he also knows support.
Current Limiting & Downgrading & Fusing & Isolation:
This is what to do, not afraid of ten thousand afraid, one thousand, one thousand, you really not to live, current limiting, hold part is blocked out but can’t say no, degradation, degradation was hanging type, fuse, at least not to affect other systems, isolation, you itself is independent, but you can call other systems, you now you don’t drag brothers.
Message queue (peak load filling) :
When it comes to this term, many partners will know, right MQ, you buy something less you directly 100 requests to change the library I think it is no problem, but in case of 10,000 seconds, 100,000? The server is down, and the programmers are on the hook again.
Seckill is this kind of instantaneous traffic is high, but usually there is no traffic scenario, the message queue perfectly fit such a scenario ah, peak load filling valley.
Maybe my friends say that we can’t reach that level of business, so there’s no need. But I would like to say that when we write code, we should not write code with logic holes. At least in the future, when the size of the company increases, people will see that there is no need to change the code. Something!
You can put it in the message queue, and then spend a little bit to change the inventory, but a single item is actually enough to change, I’m talking about a certain point of multiple items together in a second, like double Eleven zero.
The database
As long as the connection pool is properly set up for MySQL database, the problem is not big, but generally large companies are not short of money and such activities are very frequent. The company I worked for before is like this. Such scenes are always continuous.
Separate to seconds to set up a database, for seconds to kill services, the design of the table is also possible simple point, now the Internet architecture deployment are sub-library.
As for the table depends on how we design, the set index or set index, after the completion of the explain to see the execution plan of SQL. (If you don’t know, go to Kangkang for MySQL section)
Distributed transaction
Why don’t I put this in the back end instead of at the end?
Because any of the above step is might be wrong, and we are in different service errors, that is involved in a distributed transaction, but distributed transaction you want to be sure to success or something that is wrong, or that sentence, several requests lost is lost, to ensure that the aging and services available and reliable.
So TCC and final consistency is not really a good fit, TCC development costs are very high, all interfaces have to be written three times, because there are three phases of TCC.
The final consistency basically depends on the operation of rotation training to ensure the success of an operation, then the timeliness will be greatly compromised.
The ** two-step (2PC) and three-step (3PC) **, which are considered less reliable, come in handy. They may not guarantee the final consistency of data, but the efficiency is ok.
conclusion
So far, I think I have basically said the points that should be considered and the corresponding solutions. I don’t know if there are any that haven’t been considered, but even if not, I think my design should be able to hold a complete second kill process.
Finally, you may have a new insight into the second kill system, whether a system is really not as simple as you think, and I still miss some details, this is certain.
Second kill this chapter my brain cells died a lot, considered a lot of points, finally came out, can not help but give their praise!
conclusion
Let’s have fun. Let’s have fun. Don’t joke about the interview.
Not everyone will ask the question, at least not as often as the Redis basis, but once asked, you must answer to the point.
At least you have to say the possible situations, the situations that need to be paid attention to, and the solutions and solutions, because this is the basic literacy of a coder, and it is difficult to improve without considering them.
Finally is the need to be more familiar with the whole link, pay attention to is a complete link, front-end design ah, gateway role ah, how to solve the Redis concurrency competition ah, data synchronization way ah, MQ role and so on, I believe you will have a good harvest.
I do not know this is a success or failure of the second creation, I have all the technical details mentioned in the corresponding article, we can pay attention to me to look at the historical article, it is late, I slip away.
I’m Aobing, and the more you know, the more you don’t know, and I’ll see you next time!
Talented people’s [three lian] is the biggest power of aobing’s creation, if there are any mistakes and suggestions in this blog, welcome talented people to leave a message!
This article is constantly updated. You can search “Santaizi Aobing” on wechat and read it for the first time. Reply [Information] There are the interview materials and resume templates for first-line big factories prepared by me.