More excellent articles.

“Microservices are not all, just a subset of a specific domain”

Selection and process should be careful, otherwise it will be out of control.

Out of all the monitoring components, there’s always one for you

What are we Developing with Netty?

This is probably the most pertinent Redis specification

Portrait of a Programmer: A Decade of Ups and Downs

Most useful series:

The Most Common Set of “Vim” Techniques for Linux Production

The most Common set of “Sed” Techniques for Linux Production

The Most Common Set of “AWK” Tips for Linux Production


As you all know, high-concurrency systems have three axes: caching, fuses, and current limiting. But there is another axe, often forgotten in the corner, depressed, and that is preheat.

Phenomenon, for example,

Let me start with two phenomena. These phenomena can only occur in systems with high concurrency. Well, it has caused multiple failures.

1. After the DB restarts, it dies instantly

In a high-concurrency environment, the DB process dies and restarts. The upstream load balancing policy was reallocated during the peak service period. The DB that has just been started accepts 1/3 of the traffic instantly, and then the load increases wildly until there is no response at all.

The cause is that the DB is newly started and various caches are not ready, and the system status is completely different from normal operation. Maybe a tenth of the normal amount would bring it to death.

2. After the service is restarted, the access is abnormal

Another common problem: ONE of my servers had a problem, and due to load balancing, the rest of the machines immediately took the requests and ran fine. When the service rejoined the cluster, a large number of time-consuming requests occurred, and in the case of high volume of requests, a large number of failures occurred.

The causes can probably be attributed to:

1. After the service starts, the JVM is not fully ready, the JIT is not compiled, etc. 2. Various resources used by the application are not ready. 3. Load balancing occurs rebalance.


These two problems are not well preheated

Warm Up means cold start/Warm Up. When the system is at a low water level for a long time and the flow suddenly increases, pulling the system directly to a high water level may overwhelm the system instantly. Through the “cold start”, the flow slowly increases to the upper limit of the threshold in a certain period of time, giving the cold system a time to warm up, preventing the cold system from being overwhelmed.

I want a curve like this.

The truth is more complicated

Traffic is unpredictable, unlike naturally growing traffic, or man-made attacks —- It’s a process from scratch. Even some vaunted ultra-fast components, such as LMax’s Disruptor, crashed under this sudden surge.

The most appropriate entry level for WarmUp is the gateway. Node4 is the startup node, and the load-balancing component integrated into the gateway will recognize the newly added instance and gradually load it to the machine until it can truly handle high-speed traffic.

1. Your application directly fetches information from the registry and then allocates traffic to the client component. 2. Your application goes through some complex middleware and routing rules to eventually locate on a DB. 3. Your terminal may be directly connected to the MQTT server through the MQTT protocol.

Let’s take an abstraction and see that all of this traffic distribution logic, including gateways, can be called clients. That is, all warmUp logic is on the client side, tightly coupled with load balancing.

The solution

Interface peatlands

According to the above analysis, the problem can be solved by coding means to control all client calls.

A simple polling method

1. I need to be able to get the set of all the resources to be called, as well as the startup time, cold startup configuration, etc. 2. Assign some weight to these resources. For example, the maximum weight is 100 and the cold startup succeeds after 100 seconds. If this is the 15th second, the total weight is 100 times n minus 1 plus 15. 3. According to the calculated weight, the traffic will increase gradually according to the passage of time until it is equal to other nodes. 4. In an extreme case, my backend only has 1 instance and won’t start at all.

In the case of SpringCloud, we need to change the behavior of these components.

1. Load balancing policies in the Ribbon 2. Load balancing policy of the gateway.

Fortunately, they are basic components, so you don’t have to copy code back and forth.

tour

As the name implies, this means that all the interfaces are accessed in advance, so that the system is prepared for resources in advance. For example, iterate over all HTTP connections and then send the request. This method is partially effective; some lazy resources are loaded in this phase, but not all of them. JIT and other enhancements can make the warm-up process very long, and a cursory approach can only work to a certain extent.

For example, some DB, after starting, will execute some very special SQL, so that the PageCache to load the most needed hot data.

State reserve

The system takes a snapshot when it dies, and when it boots up, it returns intact.

This process is somewhat magical, because normal abnormal shutdown, the system does not have a chance to say anything, so you have to take snapshots of the running system on a regular basis.

When the node is started, the snapshot is loaded into the memory. This is widely used in some in-memory components.

End

By comparison, we found that the most sensible approach was to code and integrate warmUp logic on the client side. The work can be painful and long, but it ends well. You can also select Remove Nginx -> Change weights -> Reload nginx. Sometimes very effective but not always effective, usually very reassuring but not always reassuring. Whatever you want. After all, there’s no foreplay. That’s reckless.

More excellent articles.

“Microservices are not all, just a subset of a specific domain”

Selection and process should be careful, otherwise it will be out of control.

Out of all the monitoring components, there’s always one for you

The Most Common Set of “Vim” Techniques for Linux Production

What are we Developing with Netty?

Linux five-piece or something.

“Linux” Cast Away (1) Preparation”

Linux: Cast Away (2) CPU

Linux cast Away (3) Memory

Linux cast Away (4) I/O

Linux cast Away (5) Network Chapter