How can IT systems hold up? Is there a way to anticipate and build a “defensive fortress” in advance? On June 23, farce like cloud CTO He Yipeng an online already, “the outbreak era, enterprise performance evaluation of best practice” live, live the in-depth the performance high concurrency site common abnormal events, Shared how from 0 to 1 construct cloud pressure measuring system of standardization, standardization, looking for the best answer to that question.
“The importance of performance testing is self-evident, and if it is not done well, it can lead to catastrophic problems.”
· Instant surge of user visits;
· Full traffic on the server;
· System resources are occupied for a long time;
· The service access exceeds the maximum limit and the service is too narrow;
· Although the website is accessible, the latency is extremely high.
CPU load is also very common, generally speaking, the business of complex system monitoring the performance of the single consumption reached 20%, the single remaining is usually only 70% to 80%, complex scenarios frequent access may lead to the CPU instantaneous as high as 90% above, based on this, if during the period of performance test did not test well in the scene, For the server is a relatively big disaster.
The upper limit of PPS connections, which is not well estimated, can cause connection errors for subsequent users when the upper limit is full, typically HTTP 503 errors.
Rapid service takeover can only be completed in the 90S. Therefore, it is necessary to perform hot switch and hot deployment scenarios during capacity assessment of the early performance test. After the scenario is set up, services can be quickly taken over through horizontal expansion, and some complex performance problems can be quickly solved.
In 100 milliseconds, the turnover will be reduced by at least 1%. Compared with scenes like 618 and Double 11, if the user experience is poor and the payment cannot be made, the loss is conceivable.
“In the era of mobile Internet, how can enterprises conduct effective and accurate performance tests for frequent marketing activities and rapid product iterations?”
The architecture of IT system is also evolving rapidly, with the transformation from single host to 1000 application hosts, more than 4000+ distributed CDN nodes, more than 10 device layers of link nodes, and the prevalence of distributed micro-service architecture. In this context, traditional performance testing faces many problems:
· 10 physical hosts are required to build a concurrent test environment with 10000 users;
· The deployment time of the test environment is more than 5 days and the environment reuse rate is low;
· The License authorization cost of 10,000 concurrent applications exceeds one million;
· The management of tool scripts, data and reports is scattered, causing great security risks;
· Test tools such as LoadRunner and Jmeter are complicated to operate and costly to learn, making it difficult for ordinary testers to master.
The concept of cloud pressure measurement was born in 2005. With the rapid development of cloud computing technology, cloud resources are used to achieve elastic, scalable and freely scalable distributed pressure generation mode. Using cloud resources, cloud pressure testing realizes one-stop performance testing, which can simulate various abnormal scenarios of the system. Users do not need to purchase multiple resources, including servers and computer rooms, which can save a lot of resource costs and labor costs. At present, foreign such as Soasta, domestic such as Ruixiang Cloud, its cloud pressure measurement products have become the traditional performance testing platform of the most powerful rivals.
Four advantages:
· Simple and easy to use: the script of cloud pressure test can be generated in 3 minutes, because all test resources are deployed in the cloud, which can realize the second-level startup, the second-level return of test data and the synchronous location of performance problems.
· Full stack monitoring: all cloud pressure measurement products are based on distributed cloud computing services, which can respond quickly based on location, and can also realize synchronous monitoring data backtracking to achieve full stack monitoring data collection, comprehensively covering the network layer, server layer, operating system layer and application layer.
· Large-scale deployment: The test nodes of most cloud pressure measurement manufacturers can cover the world, realize location-based on-demand customization, and realize full-link real nodes, reaching tens of millions of concurrent requests.
· High cost performance: SaaS services naturally have the advantage of flexibility. Cloud pressure measurement products can be charged on demand and do not need hardware deployment. It is easy to realize integrated test management services, and teams can also achieve teamwork, greatly improving work efficiency.
“What performance issues can the cloud stress test platform help users solve? How to solve it?”
In IDC deployment, the convergence ratio of switches needs to be considered. At the operating system level, the typical problem is the standardization of parameters, such as Sysctl and the configuration of some network parameters. On the server, you need to determine which processes occupy too much CPU during CPU monitoring. If the process usage is too high, you need to analyze the process usage. If the disk I/O usage is too high, you need to determine whether SSD disks are available.
The CPN configuration is incorrect.
Our firm like a cloud
In the long-term performance testing experience, there are four methods:
· Data preburial. That is, hang the test library under the production application, so that even if the test performance is slightly lower, it can basically measure the effect of the real access process, and the data is basically isolated and will not be polluted, which is convenient for later cleaning.
· Transformation of non-interface signs. Common examples are user-Agent field identifiers in HTTP request headers. You can select some unusual request headers in request identifiers, and perform service parsing at the back end to identify these data to improve data clearing speed.
· Bypass data routing. When the business flow is clear, the normal business data and the pressure data can be separated and processed, and then the pressure data table can be directed to track and clean. If only query transactions are conducted online, ruixiang Cloud’s main clean water meter and record table will not affect normal business.
· Interface field identification transformation. The identification bits of the pressure measurement field are reserved in the key data table, so that the information of the identification class can be directly filled in the pressure measurement stage, and the subsequent data cleaning can be directly based on this.
Pressure test
We have had an in-depth understanding of all aspects and matters needing attention. Then, we will return to the exploration of the original problem. What performance problems can the cloud stress testing platform help enterprises solve? Lies mainly in
4 points:
· Real business flow simulation. Based on cloud pressure measurement, it can not only simulate the real access of hundreds of thousands of users, but also realize flexible user behavior simulation and rapid user scaling. In addition, network traffic quality can be quickly verified. Full network traffic can be verified based on normal traffic. If the enterprise uses a load balancer similar to F5 physical hardware, you can also verify that the PPS value of the physical hardware can meet high concurrency requirements.
· Resource monitoring. In addition to the rapid detection of CPU, memory and disk, it can also monitor the use of database resources, as well as some middleware resources.
· Operating system application optimization. The cloud pressure measurement platform can provide a very good test basis for Limit parameter configuration during the whole pressure measurement process, and can also conduct real-time tuning for Tomcat connection number and Jboss connection number.
· Locating performance problems. Combined with some common APM tools, we can quickly track some slow transactions, analyze some common problems in application and database, and simulate scenarios, such as slow transaction scenario simulation, network high throughput test scenario simulation, etc.
“The battlefield”, the performance test is the development of the market weapon.