Some thoughts on coping strategies of high concurrency & high availability systems

Description:

Just some of their own views and thinking, if there are questions welcome to correct
The spring Boot starter mentioned in the article [] is self-encapsulated, address: gitee.com/itopener/sp…

Decoupling artifact: MQ

MQ is the decoupling wizard of distributed architecture and is widely used. Some distributed transactions are also done using MQ. Due to its high throughput, in complex business situations, basic data validation can be done first, then the data can be put into MQ and the subsequent complex business logic can be processed asynchronously by the consumer, which can greatly improve the response time and user experience. If the consumer service processing is complex, you can deploy multiple nodes in an independent cluster based on actual processing capabilities. Note that:

Verify that the message was successfully sent to MQ

For example RabbitMQ sends a message to MQ with a send callback acknowledgement, which does not prevent message loss completely, but can prevent some extreme cases of message failure. MQ transactions can be leveraged to avoid message loss in more cases

Message persistence

Care needs to be taken to configure message persistence to avoid massive message loss in the event of an MQ cluster failure

Idempotency of message consumption

Normally, messages are not sent repeatedly, but some special situations may cause messages to be sent repeatedly to consumers. Generally, a globally unique serial number is added to the message to determine whether the message has been consumed

Pay attention to user experience

Use asynchronous processing is considered in improve the system throughput under a design, relative to the real-time fast to return the result to users, user experience will be more narrowly for certain, but it is now considered a good solution, so in the beginning of the design need to evaluate whether asynchronous processing, if you need to asynchronous processing, Be sure to consider ways to give more user-friendly tips and guidance. Because asynchronous processing is a comprehensive solution of technology realization combined with the actual business situation, it should not be concerned about the product. Therefore, technical personnel should actively propose the nodes of asynchronous processing in the process as soon as possible, and consider how to design more user-friendly in the demand analysis stage. If it is introduced in the development process, it is likely that there will be major changes to the user presentation interface, resulting in requirements changes, system design changes, and then the blame will be shifted, bicker and delay

Inventory deduction

There are many ways to realize inventory deduction, and when it comes to inventory deduction, the realization plan needs to be combined with the actual business scenarios. In addition to inventory deduction, some business data also need to be recorded. Database bottlenecks can easily occur in high-concurrency applications, so consider using Redis + MQ to handle requests and leaving the MQ consumer to implement the subsequent business logic. This allows you to respond to requests more quickly and avoid more problems caused by request blocking

Use Redis to make inventory deductions

The incR command in Redis is used to realize the operation of inventory deduction. Redis built Lua interpreter from version 2.6.0, and the execution of Lua script is atomic, so you can use this feature to do inventory deduction, the specific implementation can refer to [stock-spring-boot-starter], Starter is mainly used to initialize/reset inventory, reduce inventory and restore inventory

The Redis cluster is already very efficient, able to support a certain amount of concurrent write-down inventory, and avoids overlocking due to the atomicity of Redis executing Lua scripts. If a Redis cluster does not meet business needs, consider splitting the inventory. The inventory will be divided into multiple copies and put into different Redis clusters. Multiple Redis clusters adopt polling strategy, which can basically ensure that the remaining inventory of each Redis cluster will not differ too much. However, even quantity cannot be absolutely guaranteed. Therefore, certain strategies are needed to solve this problem when the inventory shortage is returned by the inventory deduction operation. For example, when the inventory shortage is returned by the inventory deduction operation, polling to the next Redis cluster is continued. To avoid polling the entire Redis cluster for each request, you can place a mark within the application node or somewhere uniform to indicate that there is no inventory.

Deducting the idempotency of inventory

Since Redis incR command is used to deduct the inventory, the request source information cannot be stored, so the idempotency of the deducted inventory is guaranteed by the application, which can be done with client tokens or serial numbers

MQ processes business data asynchronously

Destocking comes with some business data that needs to be recorded, which is still easy to bottleneck if recorded to a database in real time, so you can leverage MQ, put the relevant information into MQ, and have the subsequent business logic processed asynchronously by the MQ consumer. Of course, if the MQ message fails to be sent, the inventory in Redis needs to be restored. The consistency between Redis operation and MQ operation cannot be guaranteed completely. Therefore, on the premise of ensuring the data consistency under normal conditions, the consistency between the deducted inventory and the actual inventory needs to be verified by similar reconciliation. But before that, I think need more give priority to limit flow problem, need to press ahead to measure application performance bottlenecks, according to the result of pressure measuring current limit on the allocation of the request and ensure the high concurrency case application won’t crash, so as to better ensure the received request can according to the normal code logic processing, reduce inventory inconsistent happens

Current limiting

Believe many apes have met a concurrent increase of system crash, so suggest pressure measure system performance bottleneck in advance, contains various application interface, database, cache and MQ bottleneck, then according to the result of pressure test configuration corresponding current-limiting value, this can largely avoid application because a large number of requests and hang up. Of course, this also brings other problems, such as the following two aspects:

Monitor and expand capacity in time

After traffic limiting, applications can only handle a certain amount of requests. For applications in the growth phase, they generally hope to handle more user requests, which means bringing more users and more profits. Therefore, you need to monitor application traffic and expand capacity based on actual conditions to improve the processing capacity of the entire system to provide services for more users

The user experience

When the application reaches the flow limit, it needs to give users better hints and guidance, which needs to be considered in the requirements analysis phase

Current limiting pre –

In a real system architecture, user requests may go through multiple levels to reach the application node, such as: nginx–>gateway–> Application. If conditions permit, you can set the flow limiting at the earliest possible position, so as to give feedback to users as soon as possible and reduce the waste of resources at subsequent levels. However, the development cost of adding a limiting configuration to an application is relatively low and may be more flexible, so it depends on the team’s situation. Nginx can use Lua+Redis to set the flow limit. Applying the inner limit can be done using RateLimiter. Of course, the function of dynamic configuration limiting can be implemented through encapsulation, such as ratelimiter-spring-boot-starter.

The cache

In high-concurrency applications, frequent data reads and writes are inevitable. At this time, cache can play a great role. Generally, high-performance cache such as Redis cluster is used to reduce frequent database reads and improve data query efficiency

Multistage cache

Although the performance of Redis cluster cache is already very high, it cannot avoid network consumption, which can cause serious consequences in high concurrency systems and needs to be minimized. Multi-level caching can be considered. Some data with very low change frequency can be placed in the in-application cache, so that it can be processed directly in the application. Compared with the centralized cache, it can still improve efficiency in high concurrency scenarios. We can refer to [Cache-redis-Caffeine – Spring-boot-starter] to achieve two-level cache, and we can refer to J2Cache, which supports a variety of two-level cache. Note that level-1 cache is cleared when the cache fails. Because level-1 cache is within an application, applications cannot communicate directly with each other in a clustered deployment system, so they can only use other tools to notify and clear level-1 cache. For example, Redis is used to realize the communication between different nodes of the same application

CDN is also a kind of cache, but mainly applicable to some static resources, such as CSS, JS, PNG images, etc., which will be used more in the front end. In some scenarios, front-end resources can be put into CDN in combination with static and static separation and front-end and back-end separation, which can greatly improve access efficiency. It should be noted that the front-end static resources may be updated, and the CDN cache needs to be refreshed when there is an update. Or another strategy is to add a symbol similar to the version number on the address of static resources, so that the path after each modification will be different, and the CDN will directly return to its application to obtain the latest file and cache it in the CDN. The use of CDN requires a relatively complete set of automated deployment tools, otherwise it will be troublesome to go online after each modification

The front-end cache

The front-end HTML can be configured to cache static resources. After the configuration, the browser will cache some resources. When the user refreshes the page, as long as the refresh is not mandatory, the user does not need to obtain static resources through the network, and the page response speed can be improved to a certain extent

The cache to penetrate

When caching is used, if the data cannot be queried in the cache, the query is returned to the source database. But if some data is also not in the database, each request will be returned to the source to query the data if no processing is done. If someone maliciously exploits this non-existent data-bulk request system, it can result in a flood of requests to perform queries in the database. This condition is called cache penetration. This is especially important in high concurrency scenarios

Prevent: If the database can not query the data, you can put a specified value in the cache, check the value from the cache, if the specified value is directly returned null, so that you can get data from the cache, so as to avoid the cache penetration problem. According to the actual situation of the cache object, a two-level cache can also be adopted to reduce the request volume of the cache device. Redis is a common cache, but NullValue cannot be stored, so the Spring cache module defines a NullValue object to represent null values. Spring in the boot Redis way spring cache is there are some defects (spring boot 1.5 x version), specific reference [my.oschina.net/dengfuwei/b…]. Flaws in the #RedisCache implementation mentioned in

Cache avalanche

Cache avalanche refers to the fact that a large number of requests arrive at the database due to caching, causing the database to crash due to excessive pressure. In addition to the above mentioned reasons for cache penetration, there may be a large number of requests to be processed at the moment the cache expires, judging that there is no data in the cache, and then directly query the database. This is also a common problem in high concurrency scenarios

Prevention: When the cache expires, some operations, such as adding a mutex lock, are required when the source is queried in the database. This prevents a large number of requests from arriving in the database at any one point in time, and of course restricts the flow at the method level, such as Hystrix and RateLimiter. You can also use encapsulation to automatically flush the cache at some point before it expires. The spring Cache annotation has a sync attribute, which is used to indicate whether data needs to be synchronized from source to data query. Spring Cache only defines the standard and does not implement the cache implementation, so it only calls the methods of different cache interfaces based on the value of sync. This needs to be considered in the implementation of the Cache interface

In the use of cache, there will be a variety of complex situations, it is suggested to sort out various scenarios and continue to improve, so as to serve as a reference in the subsequent use of cache, and to avoid exceptions caused by incomplete consideration, which is also beneficial to the training of employees

Data preprocessing

For some business scenarios, some data can be preprocessed in advance, so that the processing results can be used directly, reducing the processing logic during the request. As for restricting certain user participation qualifications, can advance to the user play tag, so that when the user requests can directly determine whether participation qualifications, if the data quantity is large, can also according to certain rules to store data distribution, also according to this rule when the user requests routed to the corresponding service to judge user participation qualifications, reduce the pressure of single node and single service data quantity, Improve overall processing capacity and response speed

Resource pre –

At present, many distributed micro-service architectures may lead to long invocation links. Therefore, some basic judgments can be front-loaded as far as possible, such as user participation qualification, traffic limiting front-loaded mentioned above, or some resources are directly requested from the front end to the destination address rather than forwarded by the server. For probabilistic high concurrent requests, it can be considered to inform users of failure to participate in the front end of random partial results at the time of user access. In short, it is to advance as far as possible in advance, to avoid calling the link does not meet the conditions of the node to do useless work

Fusing the drop

In microservice architecture, there are many interface calls. When some services are called for a long time or cannot provide services, requests may be blocked, resulting in slow response and reduced throughput. At this point it is necessary to degrade the service. When the specified time is exceeded or the service is unavailable, an alternate solution is adopted to continue the subsequent process to avoid a long request blocking time. For example, for a probabilistic request (such as a lottery), the random result is considered invalid when the processing time is too long (such as not winning the lottery). One thing to note

In normal cases, the system can respond quickly. When the processing time times out or services are unavailable, you need to monitor timely alarms to recover services as soon as possible
When circuit breaker degradation occurs, corresponding mechanisms are required, such as retry and rollback. Business data needs to be consistent across code logic

You can use Hystrix for fuse downgrading

Compensation mechanism

A compensation mechanism is required for service processing failures, such as retry and rollback

Retry Limits the number of retries to avoid an infinite loop. If the number of retries exceeds the threshold, an alarm is generated for manual handling or other operations. Retry needs to be idempotent to avoid inconsistency caused by repeated processing
The fallback. If the number of retries exceeds or some processing failures occur, you need to roll back data to avoid data inconsistency

idempotence

In the actual processing may occur in a variety of situations resulting in repeated processing, it is necessary to ensure the idemidemality of processing, generally can use a globally unique serial number to determine uniqueness, to avoid the problem of repeated processing, mainly in MQ message processing, interface invocation and other scenarios. For globally unique serial numbers, see Tweeter’s Sequence-spring-boot-starter algorithm. The specific generated location depends on the actual business scenario, mainly needs to consider various extreme exceptions

Monitoring alarm

In a high-concurrency system, the number of users is large. Once problems occur, the scope of impact will be large. Therefore, monitoring alarms need to be timely feedback of system problems to quickly restore services. A sound response process must be established. It is recommended that a corresponding experience database be established to record common problems. On the one hand, problems can be avoided and problems can be located in a timely manner.

Automation operation and maintenance needs to be vigorously constructed, which can greatly improve the response and solution speed of online problems. In addition, a full-link monitoring mechanism is required to facilitate the troubleshooting of online problems and solve them quickly. Full-link monitoring can be considered like Pingpoint, Zipkin, OpenCensus, etc

Personnel management

The division of labor should be clear, and there should be people ready to receive and deal with problems
Information is transparent, team members need to know enough about the system, and team members need to be able to act on their own
Knowledge base, organize technical and business frequently asked questions, experience, convenient for new members to quickly understand and integrate
Share, regularly share technical and business knowledge, team members make rapid progress together. Proper sharing of system performance results can properly boost team morale
Properly communicate with the business to understand the front-line business needs and usage for continuous improvement, and also have a long-term consideration in system design
Use project management tools appropriately and quantify some of the work appropriately. People who don’t fit in need to be weeded out

Avoid over-designing

Avoid overdoing it for a few extreme cases
Avoid over-splitting microservices and avoid distributed transactions as much as possible

Code structure and specification

Attention should be paid to the design of the code structure to improve the code reuse rate
Strict adherence to code specifications makes it easier for new members to understand and for team members to understand each other

Reference: my.oschina.net/dengfuwei/b…

The above is the summary and thinking of some personal views, if you have any questions, welcome to point out