A brief introduction to high concurrency
In the business architecture of the Internet, high concurrency is one of the most difficult businesses to deal with. Common usage scenarios include: seckilling, buying, booking system; There are many complex issues that need to be addressed in a highly concurrent process, mainly involving the following aspects:
- Flow management, step by step to undertake peak cutting;
- Gateway control, routing request, interface fuse;
- Concurrency control mechanism, resource locking;
- Distributed architecture, isolating services and databases;
The core of high-concurrency business is flow control. It is a relatively complicated process to control the sinking speed of flow or the size of the container to undertake flow. The second is multi-threaded concurrent access to shared resources, the process needs a locking mechanism to avoid data writing disorder.
Second, the second kill scenario
1. Pre-purchase business
Before the activity officially starts, make an appointment for the activity and collect and control part of the traffic first. At the real time point, a lot of data may have been preprocessed, which can greatly reduce the pressure of the system. With a certain reservation flow can also be prepared for the inventory system in advance, killing two birds with one stone.
Scene: Activity reservation, deposit reservation, high-speed rail ticket pre-purchase.
Buy in batches
The mechanism of batch buying and buying is the same, but a lot of pressure is relieved in the flow. The pressure resistance of 10W pieces inventory system and 100 pieces inventory system is not the same level. If you kill 10W pieces of inventory in a second, the system at least bears several times more than 10W flow impact, kill 100 pieces of inventory in a second, the system may bear hundreds or thousands of flow ended. The following flow cutting summit details the strategy mechanism here.
Scene: a number of timeshare buying, high-speed rail tickets released in batches.
3. Real-time SEC kill
The most difficult scenario is the on-time real-time seckilling activity. If 1W products are grabbed at 10 o ‘clock on the dot, high concurrent traffic will flood in around this time point to refresh the page or request the interface for buying, which is the most complex scenario to deal with.
- First of all, the system must accept the influx of flow;
- Continuous page refresh to load in real time;
- High concurrency request flow control lock, etc.
- Service isolation and system protection of database design;
Scene: 618 on time buying, double 11 on time second kill, e-commerce promotion second kill.
Three, flow peak cutting
1. Nginx proxy
Nginx is a high-performance HTTP and reverse proxy Web server, often used in cluster services as a unified proxy layer and load balancing strategy, can also be used as a layer of traffic control layer, providing two types of traffic limiting methods, one is to control the rate, the other is to control the number of concurrent connections.
Based on the leaky bucket algorithm, it provides the ability to limit the request processing rate. Limit the access frequency of IP addresses. When the traffic suddenly increases, the exceeded requests are rejected. You can also limit the number of concurrent connections.
In the scenario of high concurrency, traffic can be controlled in a relatively stable state after various limiting policies of Nginx layer.
2. CDN nodes
The proxy node of CDN static files and the service of seckilling scenario have such an operation characteristic. Before the countdown of activities starts, a large number of users will constantly refresh the page. At this time, the static page can be handed over to the PROXY of CDN level to share the pressure of data service interface.
At the CDN level, a layer of traffic limiting can also be done, and a layer of strategy can be built into the page. Assuming that there are 10W users clicking to buy up, only 1W traffic can be released, and other activities can be directly prompted to end, which is also one of the commonly used methods.
Meaning: usually participate in the buying activities, you may not even reach the data interface level of the request, on the rapid response of the goods have been robbed, it is free.
3. Gateway control
Gateway level processing service interface routing, some validation, the mainest is some strategies can be integrated into the gateway, such as after the layer upon layer of flow control, the request is close to the core of data interface, then the gateway level built-in some policy control: if the event is to activate old customers, to quickly determine the user gateway level attributes, old users will release the request; If the purpose of the activity is to attract new users, then release more new users.
After these levels of control, there is not much traffic left, and then the data operations really start to snap up.
Translation: If 10 million people participate in a buying event, 10 million or less of the buying traffic actually sinks to the bottom and is handled in a decentralized cluster of services.
4. Concurrent fusing
In distributed service interfaces, there is also the most subtle layer of control. For an interface to control the number of requests processed per unit, this interface-based response time is taken into account. The faster the response, the higher the number of concurrent requests per unit time.
After layer upon layer of traffic control, the data interface layer does not share much pressure any more. At this time, it is the problem of locking in the second kill service.
Four, distributed lock
1. Pessimistic lock
Mechanism to describe
All requesting threads must acquire the lock before performing database operations, and based on serialization, threads that do not acquire the lock are in a waiting state and have a retry mechanism to try to acquire the lock again after a unit of time or simply return.
Process diagram
Redis basic command
SETNX: If the key does not exist, set the key to value. If the key already exists, SETNX does nothing. The key can also be set to expire, after which other threads can continue to try the lock acquisition mechanism.
The command of Redis is used to simulate the lock acquisition action.
Code implementation
This is based on the lock acquisition and release mechanism implemented by Redis.
import org.springframework.stereotype.Component; import redis.clients.jedis.Jedis; import javax.annotation.Resource; @Component public class RedisLock { @Resource private Jedis jedis ; /** * getLock */ public Boolean getLock (String key,String value,long expire){try {String result = jedis.set(key, value, expire) "nx", "ex", expire); return result ! = null; } catch (Exception e){ e.printStackTrace(); }finally { if (jedis ! = null) jedis.close(); } return false ; } /** * unLock */ public Boolean unLock (String key){try {Long result = jedis.del(key); return result > 0 ; } catch (Exception e){ e.printStackTrace(); }finally { if (jedis ! = null) jedis.close(); } return false ; }}Copy the code
This is based on the API implementation of Jedis, and a configuration file is provided here.
@Configuration public class RedisConfig { @Bean public JedisPoolConfig jedisPoolConfig (){ JedisPoolConfig jedisPoolConfig = new JedisPoolConfig() ; jedisPoolConfig.setMaxIdle(8); jedisPoolConfig.setMaxTotal(20); return jedisPoolConfig ; } @Bean public JedisPool jedisPool (@Autowired JedisPoolConfig jedisPoolConfig){ return new JedisPool (jedisPoolConfig, "127.0.0.1", 6379); } @Bean public Jedis jedis (@Autowired JedisPool jedisPool){ return jedisPool.getResource() ; }}Copy the code
Problem description
May occur during the actual system operation is as follows: 01 after acquiring a lock, thread process is hung, following the execution are not performed, lock expires, thread 02 and acquiring a lock, in the updated database, thread 01 recovery, state after the lock is held at this time, to continue after the disorder problem can easily lead to data.
At this time, it is necessary to introduce the concept of lock version. Assume that thread 01 obtains lock version 1. If it does not execute, thread 02 obtains lock version 2.
CREATE TABLE 'dl_datA_lock' (' id 'INT (11) NOT NULL AUTO_INCREMENT COMMENT '主键 iD ', 'inventory' INT (11) DEFAULT '0' COMMENT 'inventory ',' lock_value 'INT (11) NOT NULL DEFAULT '0' COMMENT' inventory ', PRIMARY KEY (' id ') ENGINE = INNODB DEFAULT CHARSET = utf8 COMMENT = '表';Copy the code
Note: LOCK_value is used to record the lock version as a condition for controlling data updates.
<update id="updateByLock">
UPDATE dl_data_lock SET inventory=inventory-1,lock_value=#{lockVersion}
WHERE id=#{id} AND lock_value <#{lockVersion}
</update>
Copy the code
Note: The update operation not only requires the thread to acquire the lock, but also determines that the version of the thread lock cannot be less than the latest version in the current update record.
2. Optimistic Locking
Mechanism to describe
Optimistic locking is mostly controlled based on data records. When updating a database, it is judged based on the pre-query conditions that if the queried data is not modified, the update operation succeeds. If the pre-query results are not valid as the update conditions, data write fails.
Process diagram
Code implementation
The business process queries the records to be updated and then uses the columns read as update criteria.
@Override public Boolean updateByInventory(Integer id) { DataLockEntity dataLockEntity = dataLockMapper.getById(id); if (dataLockEntity ! = null){ return dataLockMapper.updateByInventory(id,dataLockEntity.getInventory())>0 ; } return false ; }Copy the code
For example, if you want to update the inventory, take the read inventory data as the update condition. If the read inventory is 100, the inventory changes during the update, then the update condition naturally cannot be established.
<update id="updateByInventory">
UPDATE dl_data_lock SET inventory=inventory-1 WHERE id=#{id} AND inventory=#{inventory}</update>
Copy the code
Distributed services
1. Service protection
When dealing with highly concurrent seckilling scenarios, the service is often suspended. It is common for some APP marketing pages to show the prompt of lost pages due to popular activities, but the operation of the whole application is not affected. This is the isolation and protection mechanism of the service.
Based on the distributed service structure, high concurrency business services can be isolated, so that the whole service will not be affected by the failure of the kill service, resulting in the scene of service avalanche.
2. Database protection
Database protection and service protection complement each other. In distributed service architecture, services and databases are corresponding. In theory, sSEC service corresponds to SSEC database, which will not cause the whole database to break down because of sSEC database failure.