preface

1. Catching fire

1.1 Micro-services in the eyes of farmers

In recent years, microservices have sprung up like a prairie fire and burned in the entire technology community. Microservices architecture is considered as the target of the evolution of IT software as a service architecture. Why is micro-service so popular and what value can micro-service bring to the enterprise?

1.1.1 To understand microservices from the perspective of planting crops

Let’s take farming as an example of how to make the best use of a field:

1. Rows of corn were planted in the field.

2. Later, it was found that the clearing at the foot of the corn could be used, and beans were planted at a distance from each other. When the beans grew up, they climbed up the corn stalk and finally wrapped themselves tightly around the corn stalk.

3. It was later discovered that there was room for more potatoes in the space between each row of corn, and that the potato vines would intertwine and devour nutrients at the foot of the corn.

What appears to be an area of land that is fully utilized does not, in fact, provide adequate light and nutrients to crops, which increases the cost of weeding, loosening, fertilizing, irrigation and harvesting later on.

Is it better to cultivate the train of thought below? The whole field is divided into plots of different sizes as needed, with clear boundaries between them, so you have corn fields, potato fields, bean fields, and whatever plots you want to plant.

This planting has many advantages, such as corn, beans and potatoes need different nutrients, can be fertilized by professional technicians; Separate the corn, beans and potatoes to prevent the bean vines from climbing on the corn, which can not grow freely. Potatoes get the nutrients that corn needs and so on.

In fact, the software system implementation is very similar to the planting method of crops, and the traditional application is not satisfactory in scalability, reliability and maintenance cost. How to make full use of a large number of system resources, management and monitoring service life cycle is a headache, software system design urgently needs the above “land segmentation planting method”. A microservice architecture emerges: in a microservice system, each business system interacts with the RestAPI, which is very friendly to the processing of messages (character sequences). In this way, the various business systems are unified into an organic system based on the Restful architecture style.

1.2 Iceberg under microservices architecture

The Titanic, once the world’s largest passenger ship and described at the time as “unsinkable,” sank after hitting an iceberg in the North Atlantic. We often only see it as it comes to the surface, and underwater infrastructure such as resource planning, service registry discovery, deployment upgrades, grayscale publishing, and so on are all factors to consider.

1.2.1 advantage

1. Complex application decomposition: Complex business scenarios can be decomposed into multiple business systems, and each service of each business system has a boundary clearly defined with message-driven apis.

2. Contract-driven: Each business system can freely choose technology, form a technical team to exploit Mock service providers and consumers in parallel, and finally achieve dependency decoupling.

3. Free expansion: Each system can be independently expanded according to service needs.

4. Independent deployment: Each service system is independent of each other and can be deployed on suitable hardware as required.

5. Good isolation: The leak of one service system resource will not lead to the breakdown of the whole system, and the fault tolerance is good.

1.2.2 Challenges

1. Service management: After agile iteration, there may be more and more microservices and more and more interactions between various business systems. How to make efficient cluster communication scheme is also a problem.

2. Application management: After each service system is deployed, a process can be started or stopped. In the event of a power failure or downtime, making a seamless switch requires a strong deployment management mechanism.

3. Load balancing: To cope with heavy traffic and improve system reliability, the same service system is deployed in distributed mode, that is, one service instance is deployed on multiple machines. If a service system fails, the distributed solution of automatic scaling on demand also needs to be considered.

4. Fault location: The logs of a single application are centralized, which facilitates fault location. However, it is difficult to locate faults in a distributed environment and analyze logs.

5. Avalanche problem: Distributed systems have a problem where the availability of any service is not 100% due to the instability of the network. When the network is unstable, service providers themselves can be dragged to death, causing service callers to block, which can lead to an avalanche effect.

Michael T. Nygard’s brilliant Release It! The book Outlines many patterns for improving system usability, two of which are very important: the use of timeout policies and the use of fuse mechanisms.

6. Timeout policy: If a service is frequently invoked by other parts of the system, the failure of one part may lead to cascading failures. For example, an operation that invokes a service can be configured to execute a timeout, and if the service fails to respond within that time, a failure message will be returned. However, this strategy can cause many concurrent requests to the same operation to be blocked until the timeout period expires. These blocked requests may store critical system resources such as memory, threads, database connections, and so on. As a result, these resources can become exhausted, leading to the failure of systems that need to use the same resources. In this case, it would be preferable if the operation failed immediately. Setting short timeouts may help with this, but the time it takes for an action request to be sent to receive a success or failure message is uncertain.

7. Fuse mode: the fuse mode uses a circuit breaker to detect whether the fault has been solved, preventing repeated attempts to perform an operation that may fail, thus reducing the time to wait for fault correction, which is more flexible than the timeout policy.

The avalanche effect of a Tomcat container on a shopping site in a high concurrency scenario is introduced to discuss Hystrix’s thread pool isolation technology and fuse mechanism.

2. Application of protection from avalanche

2.1 The nature of the avalanche problem: Servlet Container crashes under high concurrency

Let’s start with a simplified model that is common in distributed systems. Servlet Container in a Web server. When the Container is started, the background initializes a scheduler thread to handle Http requests, and then pulls a worker thread from the thread pool for each request to handle the request, thus achieving concurrency control.

Servlet Container is our Container, such as Tomcat. A user request may depend on multiple external services. Considering that the number of threads in the application container is basically fixed (for example, the default Tomcat thread pool is 200), in the case of high concurrency, if an external dependent service (third-party system or self-developed system failure) is blocked due to timeout, the whole main thread pool may be occupied, increasing the memory consumption. This is the long request congestion anti-pattern (a deterioration pattern in which system performance deteriorates or even crashes as the delay for a single request increases).

Further, if the thread pool is full, the entire service becomes unavailable, and the problem can be repeated. So the whole system collapses like an avalanche.

2.2 Several scenarios of avalanche effect

1. Traffic surge: For example, abnormal traffic and user retry increase the system load.

2. Cache refresh: Assume that A is the client and B is the Server. Assume that all requests from system A flow to system B, and the requests exceed the carrying capacity of system B.

3. The program has bugs: logic problems of code circular call, memory leakage caused by unreleased resources and other problems;

4. Hardware faults: For example, the system breaks down, the equipment room is powered off, or the optical fiber is cut off.

5. Thread synchronous waiting: Synchronous service invocation is often used between systems, with core and non-core services sharing a thread pool and message queue. If a core business thread calls a non-core thread, the non-core thread is handed over to a third-party system to complete. When the third-party system itself has problems, the core thread is blocked and has been in a waiting state. Interprocess calls have a timeout limit, and eventually the thread will be cut off, which may also trigger an avalanche.

2.3 Common solutions to avalanche effects

There are many options for dealing with these avalanche scenarios, but there is no one-size-fits-all model.

1. Use automatic capacity expansion to cope with sudden traffic surge or install a current limiting module on the load balancer.

2. For Cache refresh, refer to the service overload case study in Cache applications

3. Hardware faults, multi-equipment room DISASTER recovery, cross-equipment room routing, and remote multi-live.

4. For synchronous waiting, Hystrix is used for fault isolation and fuse mechanism to solve the problem that dependent services are unavailable.

In practice, thread synchronous waiting is the most common scenario to trigger an avalanche effect, and this article focuses on solving the avalanche problem of services using Hystrix technology. Then share solutions such as traffic surge and cache refresh.

3. Isolation and fusing

Hystrix is an open source library based on the Apache License 2.0 protocol. It is currently hosted on GitHub. Hystrix is designed to address latency and fault tolerance in complex distributed systems.

Hystrix uses the command pattern, and the client needs to inherit the abstract HystrixCommand class and implement its specific methods. Why use command mode? Anyone who has used an RPC framework should know that more than one method can be defined by a remote interface, and the command pattern is ideal for this scenario in order to protect individual method calls at a finer level of granularity.

The essence of the command pattern is the separation of method invocation and method implementation, where we gain security by abstracting interface methods into subclasses of HystricCommand and allowing control to descend to the method level.

Hystrix’s core design concept is based on the command pattern. The command pattern UML is shown below:

As you can see, Command is an intermediate layer added between Receiver and Invoker. Command encapsulates the Receiver. So how does Hystrix fit into the picture above?

The API can be either Invoker or Reciever, and by extending the Hystrix core class HystrixCommand to encapsulate these apis (for example, remote interface calls where CRUD operations on the database may be late), you can provide elastic protection for the API.

3.1 Resource Isolation Mode

The essence of Hystrix’s ability to prevent avalanches is its use of resource isolation, which can be explained by the analogy of a reservoir. In daily life, a large reservoir is separated by a small pool, so that if the water in one pool is polluted, it will not affect the other reservoirs. If there is only one reservoir, the water in the whole pool is polluted, and the water in the whole pool is unavailable. Software resource isolation is the same. If you use resource isolation, you isolate calls to remote services into a separate thread pool. If the service provider becomes unavailable, then only that separate thread pool will be affected.

(1) Thread pool isolation mode: a thread pool is used to store the current request. The thread pool processes the request and sets the timeout time for task return processing. The accumulated requests are piled into the thread pool queue. In this way, thread pools need to be applied for each dependent service, which has certain resource consumption. The advantage is that it can cope with sudden traffic (when the traffic peak comes, data can be stored to the thread pool team for slow processing). Java’s ThreadPoolExecutor thread pool and queue implementation. Thread pool isolation refer to the following figure:

Advantages of thread isolation:

1. The requesting thread can be completely isolated from the execution thread of the dependent code;

2. When a dependent thread fails and becomes available, the thread pool will be cleaned up and immediately become available again.

3. The size of the thread pool can be set to control the amount of concurrency. When the thread pool is saturated, services can be denied to prevent the spread of dependency problems.

Disadvantages of thread isolation:

1. Increased processor overhead, with each command execution involving queuing (default SynchronousQueue to avoid queuing) and scheduling;

2. Added complexity to code that relies on thread state such as ThreadLocal, requiring manual passing and cleaning of thread state.

(2) Semaphore isolation mode: an atomic counter is used to record how many threads are currently running, and the request is made to judge the value of the counter first. If it exceeds the set maximum number of threads, the new request of this type will be discarded. If it does not exceed the set maximum number of threads, the request for counting operation will be performed to counter +1, and the request will return counter -1. This is a strict thread control and immediate return mode, which cannot cope with sudden traffic (when the number of threads being processed exceeds the peak of the traffic, other requests will be returned directly, without continuing to request dependent services), referring to the usage of Java semaphores.

Hystrix isolation mechanism used by default thread pool, of course, the user can configure HystrixCommandProperties for isolation strategy for ExecutionIsolationStrategy. The SEMAPHORE.

Characteristics of signal isolation:

1. The biggest difference between signal isolation and thread isolation is that the thread executing the dependent code is still the request thread, which needs to apply through the signal;

2. If the client is trusted and can return quickly, use signal isolation instead of thread isolation to reduce overhead.

The difference between thread pool isolation and signal isolation is shown in the following figure. With thread pool isolation, the user requests 15 threads, 10 threads dependent on thread pool A and 5 threads dependent on thread pool B. If semaphore isolation is used, if the semaphore requested to client C is set to 15, then the 10 signals requested by the user on the left and the 5 semaphore requested by the user on the right in the figure need to be compared with the set threshold. If the value is less than or equal to the threshold, the set threshold will be executed; otherwise, the set threshold will be returned.

Recommended scenario: Divide the business thread pool into different levels based on the request service level, and even deploy the core business on a separate server.

3.2 Fuse mechanism

Fuses are similar to fuses in our homes. When the current is too high, fuses automatically blow to protect our electrical appliances. Assuming that without the protection of the fuse mechanism, we could have numerous retries, which would inevitably increase the pressure on the server, creating a vicious circle; If the retry function is turned off, how can we restore it when the server becomes available again?

Fuses are perfect for this scenario: When the ratio of failed requests (failed/total) reaches a certain threshold, the fuse is turned on and sleeps for a period of time, after which the fuse is in a half-open state, in which it tentatively releases a portion of traffic (Hystrix supports single Request only). If this part of traffic is successfully invoked, turn off the fuse again. Otherwise, the fuse remains on and enters the next sleep cycle.

Recommended usage scenario: The Client directly invokes the remote Server (the Server is unavailable due to some reason and occupies system resources, such as memory and database connection, between the request from the Client and the timeout response from the Server) or shared resources.

The following scenarios are not recommended:

1. Applications directly access data such as data in memory. Using fuse mode only adds extra overhead to the system.

2. As an exception handling substitute for business logic.

There are many technologies that can not be explained clearly by a few words, so I simply recorded some videos with my friends. In fact, the answers to many questions are very simple, but the thinking and logic behind them are not simple. It is necessary to know what is what and why. If you want to learn Java engineering, high performance and distributed, simple. Micro services, Spring, MyBatis, Netty source analysis of friends can add my Java advanced group: 318261748 group ali Daniel live explain technology, as well as Java large Internet technology video free to share to you.

Conclusion thinking

1. This article introduces the avalanche effect of services from the distributed architecture used in my previous projects, which leads to Hystrix (of course, Hystrix also has many excellent features such as caching, batch processing of requests, master-slave sharing, etc. This article mainly introduces resource isolation and circuit breaker). It is mainly explained in three parts:

2. The first part: it introduces the micro service architecture of software design based on the idea of farming the field, briefly introduces its advantages, and focuses on the challenge: avalanche problem.

3. Part TWO: Taking the collapse of Tomcat Container under high concurrency as an example, it reveals the process of avalanche generation, summarizes several avalanche-inducing scenarios and various solutions, and introduces the Hystrix framework for synchronous waiting.

4. Part 3: Introduces the Hystrix background, resource isolation (summarizing the thread pool and semaphore characteristics) and circuit breakers, and summarizes their use scenarios.

As Martin Fowler points out in his article, we can be cautiously optimistic that we are already on the path to microservice architecture transformation, although it will take time in the future. This path is still worth exploring.

The last

I here organized a Spring AOP data documents, Spring series of family bucket, Java systematic information (including Java core knowledge points, interview topics and the latest 20 years of the Internet, e-books, etc.) friends can pay attention to the public number [procedures yuan Xiao wan] can obtain.