Introduction: Nacos 2.0 improves performance by about 10 times by upgrading the communication protocol, framework and data model, addressing the performance problems gradually revealed after the release of Nacos 1.0. In this paper, the pressure test Nacos 1.0, Nacos 1.0 upgrade Nacos 2.0 process, the comprehensive performance comparison of Nacos 2.0, intuitive display of the performance improvement brought by Nacos 2.0.
Author | Cion
Nacos 2.0 improves performance by about 10 times by upgrading the communication protocol, framework and data model, addressing the performance problems that have been gradually revealed since the release of Nacos 1.0. In this paper, the pressure test Nacos 1.0, Nacos 1.0 upgrade Nacos 2.0 process, the comprehensive performance comparison of Nacos 2.0, intuitive display of the performance improvement brought by Nacos 2.0.
Pressure test preparation
Environment to prepare
In order to facilitate Nacos deployed to upgrade and show the core performance indicators, we from ali cloud micro service engine MSE (_cn. Aliyun.com/product/ali…). A three-node Nacos cluster with a 2-core CPU+4G memory was purchased.
Pressure measurement model
In order to show the performance of the system at different scales, we used the method of step-up pressurization for pressure measurement, divided the pressure into three batches for step-up start-up, and observed the operation performance of the cluster under each batch. At the same time, a Dubbo service Demo will be added in addition to the pressure cluster, and Jmeter will be used to continuously call at the pressure of 100 TPS, so as to simulate the possible influence on the actual business call under different pressure.
During the pressure test, the server and client will be upgraded when appropriate. The server is upgraded by using the one-click upgrade function provided by the MSE, and the client is upgraded by taking turns to restart in batches.
The pressure measurement process
Nacos1.X Server + Nacos1.X Client
Start the first pressure cluster and apply pressure to MSE Nacos1.2.1. Under the pressure of 6000 Providers, the CPU is about 25% when the cluster is stable and 6000 instances can be maintained.
Then the second batch of pressure cluster was started, adding 4,000 providers and collecting 10,000 providers. At this time, the cluster peak CPU has reached 60%, and the stable running time is about 45%, indicating that the cluster can run stably.
Under the pressure of the first two batches, the cluster had no stability problems, so Dubbo calls remained normal and no errors occurred.
When the third batch of pressure clusters started, the total pressure was 14,000 providers. At this time, the cluster registered 13,000 instances for a short time, and then the number of instances fell and the CPU ran full. And when you narrow down the time range, you can see that the instance still wobbles in a small area after falling.
At the same time, an error occurred in the invocation of Dubbo. As can be seen from the log of the Consumer, the Dubbo Provider was removed because the server could not support the pressure at this level, so the error of No Provider appeared in the invocation.
Nacos2.X Server + Nacos1.X Client
During the upgrade, the number of instances stored on the server is twice that of the actual instances. Based on the test results, you need to roll back the number of instances to the first batch of 6000 instances or upgrade the configurations and expand the capacity of the machine before upgrading. In this paper, the pressure cluster is stopped and then started by rolling back pressure. Let the cluster recover before performing the upgrade.
As can be seen from the monitoring diagram, after the last two batches of pressure stop, the cluster soon returned to normal, stable operation, Dubbo calls also returned to normal. Then use the MSE upgrade function to upgrade. During the upgrade process, CPU jitter is large due to performance loss caused by double-write. And because the number of instances doubled due to double write is actually equivalent to the maximum pressure of 12000 instances, the server still has some jitter, which leads to some Dubbo errors. This will not be affected if you upgrade under non-extreme pressure.
After the server upgrade is complete, the dual-write function is stopped, and the performance loss caused by dual-write is eliminated. The CPU usage is reduced and stabilized, and the number of instances is no longer jitter. Dubbo calls are fully recovered. As with the 1.x server, start the pressure cluster in two batches and compare the performance of the two versions under the same pressure.
As the client is still using the 1.x client, the usage level of the server is still very high. After all the pressure is started, the CPU almost reaches 100%. Although there is no large-scale instance drop like 1.x server, there are still a small number of instance jitter after running for a period of time, indicating that upgrading Nacos server to version 2.0 can improve the performance, but does not solve the performance problem completely.
Nacos2.X Server + Nacos2.X Client
In order to fully release the performance of Nacos 2.0, you need to upgrade the clients of the pressure cluster to version 2.0 as well. The replacement will also be divided into three batches. During this period, because the Provider is restarted, it is normal that instances on the server fall and then recover. With the upgrade of pressure cluster, it can be found that the CPU has a very obvious decline. When it finally reaches stability, the CPU decreases from the initial close to 100% to 20%, and the cluster runs 14,000 instances stably.
Pressure test results
As mentioned above, we can get the performance difference of a 2-core CPU+4G memory three-node cluster under different versions:
It can be seen that Nacos 2.0 does improve the performance greatly. New users are advised to adopt Nacos 2.0 completely, while old users are advised to upgrade the Server first and then gradually upgrade the client to release the bonus. Finally, from the whole perspective of pressure measurement monitoring, to intuitively feel the performance of different versions in different stages:
The original link to this article is the original content of Ali Cloud, shall not be reproduced without permission.