An overview,
The reason why the second kill system is difficult is that a large number of requests flood into the system in a very short time to access limited service resources at the same time, which causes heavy load on the system, and even leads to the possibility of system service breakdown and downtime. This paper will introduce the pain points in the SEC kill system and the optimization ideas for these points.
Second, what is the second kill system
Such as: 12306 Spring Festival grab tickets, the major e-commerce to engage in regular buying activities, such as online buying millet mobile phone, grab train tickets of the students know that the moment in the release of tickets may be less than 1s, tickets were snapped up.
Three, the second kill system difficulties
- High concurrency in a short time causes heavy system load
- Competing resources are limited and database lock conflicts are serious
- Avoid impact on other services
Four, second kill overall flow chart
Common layered Architecture of the Internet
- Client layer: the client page operated by mobile phone or PC. The domain name is routed to NG through DNS resolution
- Reverse proxy layer: Generally, NG is used as a reverse proxy to evenly route client requests to back-end site services. NG can also be horizontally extended to multiple instances, and each instance can be independently deployed in a master/slave high availability solution.
- Site layer: The site layer can be horizontally extended to multiple instances to balance the high concurrent load generated by requests from clients. Session information between multiple Web servers can be centrally stored in distributed cache service (Redis, MemCache).
- Service layer: The service layer can also scale horizontally to deploy multiple instances, making it the most popular microservice approach
- Database layer: Common deployment mode of database layer, such as read and write separation, separate library and table, etc
Six, the second kill system architecture principle
(1) Try to intercept requests upstream
For the second kill system, the bottleneck of the system is generally in the database layer, because the resources are limited, such as the library of 10,000 tickets, instant concurrent in 1 million requests, then there are 990,000 are useless requests, so in order to better protect the bottom limited database resources, as far as possible will request interception in the upstream.
(2) Make full use of the cache
Caching not only greatly shortens the data access efficiency, but also bears the access pressure of the underlying database, so make full use of caching for business scenarios with more read and less write
(3) Hotspot isolation
Business isolation: such as 12306 timesharing ticketing, the hot data will be dispersed to reduce the system load pressure
System isolation: To achieve software and hardware isolation of the system, not only to achieve software isolation, but also to achieve hardware isolation, to minimize the high concurrency security problems caused by the second kill.
Data isolation: Enable a separate cache cluster or database to store hotspot data
Seventh, optimization plan
(1) Page side optimization, such as:
- Button is left blank: The user is not allowed to submit requests repeatedly
- Control via JS: only one request can be submitted at a time
(2) Web Server layer optimization, such as:
- Static and static separation: if the almost unchanged static pages are directly accessed through NG or CDN, only dynamically changing pages can be requested to the Web server
- Page caching
- Nginx reverse proxy implements horizontal scaling on the Web server side
(3) Back-end Service service layer optimization
- Use cache (Redis and Memchched) : Put the business data with more reads and less writes into the cache. For example, in the second kill business, the product inventory information with frequent updates can be put into the Redis cache for processing
Note: When storing the inventory information in Redis cache, it is best to divide it into multiple copies and store it in the cache of different keys. For example, if the inventory is 100,000, it can be divided into 10 copies and store them in the cache of different keys, so that the data can be distributed to achieve higher read and write performance.
- Queue processing: Requests are queued to access the underlying DB at a controlled rate
- Asynchronous processing: For example, the order notification is processed asynchronously through message queues (RabbitMQ and Kafka)
(4) OPTIMIZATION of DB layer
- Reading and writing separation
- Table depots
- Database cluster