My blog address is studyidea.cn. Click here for more original articles

0x00. Rollover scene

It was a dark and windy night. The black brother successfully released the new version to production, carefully checked the application log, and the subsequent test was successfully accepted by the little sister.

Well, black brother I still as always steady ~

Then the little black brother went downstairs to the canteen to have a midnight snack, who knew that in the middle of eating, online operation and maintenance students sent several alarm messages, server connection number too much warning, connection number has soared to tens of thousands.

God, hurriedly put down the chicken leg that is gnawing, rush to the station to look at the problem.

0x01. Go through all the trouble and go deep

Turn on the computer and first confirm that everything is normal in production and trading. You can view the logs during this period and find that there is no abnormal situation. The logs are normally output. No way to go to check the changed code again, found that all business code, and no network connection related to the code changes.

The problem is really strange, will not think of a solution, we have to implement the restart method. After the restart, the number of connections decreased and reached the normal threshold. But soon the number of connections continued to rise, and soon it was still in the tens of thousands.

This reboot can’t solve the problem, so we have to start from the application to find out what the problem is.

The application is a routing service that distributes transactions to downstream subsystems by specifying routing codes based on upstream systems. The architecture diagram is as follows:

As mentioned earlier in this article on the Evolution of routing systems, routing systems use the Dubbo API with the following code:

Since we have another system that also deploys this application, the number of production machine connections in this system is very low. The system configuration values of the two systems are cross-referenced. Only the connections Settings are different. The current system with the problem is set to 1000 and the other system is set to 10.

Find the cause, set connections to 10, restart the application, and the number of connections on the production machine returns to normal.

0x02. Remove the silk from the cocoon and restore through

First let’s take a look at the connections configuration in action. You can go to the official documentation dubbo.apache.org/zh-cn/docs/… .

The following configurations are obtained from: Dubbo: Reference

Connections parameters can be configured in three places: dubbo:reference, dubbo:consumer, and dubbo:provider.

Note: the places marked in the figure are actually different from the source code. As of Dubbo 2.7.3, the Dubbo: Consumer document is displayed as 100 at ① in the figure. The actual source code defaults to 0. Note this. Currently, the connections parameter is mainly used for dubbo. This parameter is not used for HTTP short connections

Connections indicates the service level configuration. If the configuration is not configured, the consumer. Connections configuration value will be used. In addition, this parameter is not valid for the service provider if configured in provider.connections and is passed to the consumer via the registry as its default configuration. The actual action sequence of the three is as follows:

Connections will be used in DubboProtocol#getClients.

By default, the Dubbo protocol will use **Netty ** to establish a long connection with the service provider

The connections configuration is first fetched as shown in the figure above, and if it is greater than zero, the number of long connections is established.

If a provider exposes 10 interfaces and it has two nodes. Import all of the provider’s services on the consumer side and configure Connections =1000. When the consumer starts, a 1000x2x10=20000 connection will be created immediately. This is the root cause of the surge in production machine connections.

The routing service is programmed using the Dubbo API. After the service is successfully started, Dubbo will establish a connection with the downstream service provider only when the upstream system calls the routing service, so it appears that the number of service connections is slowly increasing.

If the connections parameter is not set, Dubbo will create shareConnections. If the consumer invokes the same service provider (IP+PORT distinction), the service interfaces will share these connections.

Shareconnections can be configured in the Dubbo: Consumer configuration or in the start JVM parameter:

-Dshareconnections=10
Copy the code

If the consumer needs to invoke 10 service interfaces of the same service provider application, and the service provider provides two nodes, shareConnections =1000, only 1000*2=2000 connections will be created after the consumer service is started.

In contrast, ShareConnections is not of the same magnitude as connections.

2.1 Using connections

When the consumer invokes the service, it will use a random connection from the connection array, with the code at DubboInvoker#doInvoke.

2.2 How can I Correctly Set the Number of Connections

First let’s take a look at the single long connection performance, document address :dubbo.apache.org/zh-cn/docs/…

For scenarios with only a few consumers, we can use the default configuration, which does not configure the connections parameter. If you call too many services from the same provider, consider adding shareConnections appropriately. Finally, if a service interface is heavily invoked, consider configuring Connections for that service separately.

0x03. Let’s talk about other configurations

Dubbo also has many configuration items, some of which are highlighted below.

3.1 dubbo. Provider. Executes

This parameter is used to control the maximum number of parallelism for each method. If the value is set to 10, each service method that has 10 requests in progress will throw an exception on the 11th service request until the previous service call completes and the number of requests in progress is less than 10 unknown.

Once set to Executes >0,Dubbo will enable the ExecuteLimitFilter through the SPI mechanism, and the source code is relatively simple.

3.2 dubbo. Reference. Actives

This parameter will control the consumer’s maximum concurrency per service per method. You can set the service method separately with dubbo:method.actives. If the value is 10, once the number of concurrent requests for a service or a method exceeds 10, the 11th service will wait. If other requests finish within the timeout period and the value of the count is less than the threshold, the 11th request will be executed. Otherwise, an error will be thrown.

This value can also be configured on dubo.provider, and will be passed to the consumer as with Connections.

Enable ActiveLimitFilter (ActiveLimitFilter)

Note the difference between actives – induced timeouts and server-side timeouts.

3.3 dubbo. Protocol. Accepts

The maximum number of service provider connections, if configuration accepts=10, once the number of service provider connections is greater than 10, the remaining additional connections will be rejected.

The method source code is as follows:

If the service provider disconnects, the consumer will print a disconnection log. In addition, consumers will periodically check the availability of long connections. If they are not available, they will initiate connections again. So on the consumer side you see the connection being disconnected, reconnected, and then disconnected again by the service provider.

0 x04. Summary

This paper analyzes the causes of locating problems through the phenomenon of too many connections in one production. As a qualified developer, we should not only be familiar with the open source framework, but also understand its underlying implementation and relevant parameter Settings. Once the parameter setting is not reasonable, it may lead to production accidents.

In addition, monitoring systems are very important for production systems. For example, if the above problem is not detected by monitoring, the black brother may not know the existence of this problem for a while, after all, he usually does not pay too much attention to the connection number index.


That was quick. I’ve been home for two weeks. Ah, can’t get out, can’t get in. In retrospect, 2020 was really an unforgettable year… All right, it’s time to get started on the 10th.

Welcome to pay attention to my public account: procedures to get daily dry goods push. If you are interested in my topics, you can also follow my blog: studyidea.cn