The performance test
concept
- Based on the performance indicators of the system, a performance test model is established, a performance test scheme is formulated, and a monitoring strategy is formulated. The performance scenarios are executed under the scenario conditions, and performance bottlenecks are analyzed and optimized. Finally, the performance results are obtained to evaluate whether the performance indicators of the system meet the set value.
Performance indicators
- Indicators include time indicators, capacity indicators, and resource usage indicators
- The time indicator refers to the interface response time and service response time
- Capacity indicators refer to interface capacity and service capacity
- Resource usage refers to the operating System (CPU, IO, Mem, Disk, Network, and System) and JVM
- Origin of indicators: Can be based on business scenarios, discussed with team members to obtain; Or no indicator is set during the pressure test to check performance bottlenecks of the system
model
- A model is an abstraction of a real scenario that tells the performance tester, what does the business model look like
- Popularly understood as a model that allows the tester to know the concurrency of a particular business, so that the tester can design a specific stress ratio based on the model
- Acquisition of model data: generally obtained from the production environment statistics (such as log of each node, and then according to the log to analyze the flow situation)
Performance Test scheme
- The solution includes: test environment, test data, test model, performance model, stress policy, access and schedule risk
Performance monitoring
- It is necessary to have the ability of stratification, segmentation, global monitoring and directional monitoring
Performance testing should have predetermined conditions
- The conditions include software and hardware environment, test data, test execution strategy, pressure compensation, etc
Have scenarios in performance testing
- Execute performance scripts in the established environment (including dynamic expansion strategies), the established data (including data changes during scenario execution), the established execution strategies, and the established monitoring. Meanwhile, observe the changes of performance state parameters at all levels of the system, and determine whether the analysis scenario meets expectations in real time
- Scenario types: Baseline performance test, capacity performance, stability performance, and exception performance
Analytical tuning of performance tests
- Performance Item Classification
- New system performance test classes: Such projects typically test the maximum capacity of the system
- Performance test class of the new version of the old system: such projects are generally compared with the old version. As long as the performance does not decline, the capacity can be calculated according to the historical data, and the requirements for tuning are generally not big.
- New system performance test optimization: This type of system is not only tested to maximum capacity, but also tuned to the best possible.
Performance testing must have results reported
- Contents include: TPS before and after tuning, response time, resource comparison diagram
What is the relationship between TPS and response (RT)
- In the actual performance test, assuming that the number of concurrent users increases by a gradient, TPS will rise slowly at the beginning, and RT will remain at a low level for a period of time. As the pressure continues to increase, TPS will also increase, RT will also rise slowly. When TPS reaches its limit and the pressure continues to increase, RT will rise rapidly and finally reach the timeout
Performance indicators
Demand indicators
- Business indicators
- For example, a service indicator requires 10 million online users. The indicator can be divided into n performance scenarios, and each performance scenario has a specified service proportion
- Technical indicators
- RT ReSponse Time, commonly referred to as ReSponse Time, includes Request Time and ReSponse Time
- HPS Hits Per Second
- Transactions Per Second (TPS)
- QPS is the number of SQL statements per second in mysql
- RPS Requests per second
- CPS HTTP return code per second
- PV page views
- UV Unique Visitor
- IP Number of independent IP addresses
- Throughput Throughput
- IOPS usually describes disks
Performance Indicator Concept
-
TPS
- The granularity of TPS needs to be defined according to the scenario; If the performance test is at the interface layer, T is at the interface level. In the case of a business-level performance test, T can be directly defined for each business step and the complete business flow.
-
Number of concurrent users
- Absolute concurrency: The number of concurrent requests at the same time
- Relative concurrency: the number of concurrent requests in a period of time
- Use TPS to carry the concept of concurrency
-
The number of online and concurrent users
- The number of online users refers to the number of users on the system at a certain time. These users do not necessarily perform actions
- Number of concurrent Users Indicates the number of online users performing operations on a service within a specified period of time (Number of concurrent users = Number of online users x Concurrency).
-
The formula
Performance Analysis
- The overall train of thought
- Accurate judgment of bottleneck
- Thread increment strategy
- The process of performance decay
- Response time splitting
- Build an analysis decision tree
- Comparison of scenes
- Accurate judgment of bottleneck
- TPS curve
- Assuming that threads are proportionally increasing, for the graph above, we can see that there is already a performance bottleneck on the second ladder, because theoretically the TPS on the second ladder should be twice as high as the TPS on the first ladder, but it is not, so there is a performance bottleneck
- Meaning of TPS (Information from TPS curves)
- Bottlenecks: In fact, all systems have performance bottlenecks, depending on which level we are doing performance testing.
- Is the bottleneck related to pressure? TPS changes with pressure. Whether the pressure increases or not, TPS has the problem of curvilinear tendency, that is, irrelevance.
- Response time curve
- The emphasis of TPS curve and response time curve
- Response time is used to determine how fast the business is, while TPS is used to determine how large the capacity is.
- The emphasis of TPS curve and response time curve
- TPS curve
- Thread increment strategy
- Two thread increment scenarios
- conclusion
- For a system, the upper limit of maximum TPS is fixed if only the pressure policy is changed (other conditions such as environment, data, hardware and software configuration are unchanged).
- As for the test of seckill scenario, it is necessary to do a good job of preheating in the early stage. Preheating means that there is a certain amount of flow running, and then the pressure increases suddenly, which is similar to the actual scene. Instead of just pouring a lot of traffic into the system at once.
- The process of performance decay
- As soon as TPS per second per thread starts getting smaller, that means a performance bottleneck has already occurred. However, after the bottleneck occurs, it does not mean that the processing power of the server (described here in terms of TPS) will decline, but rather that TPS will continue to rise, reaching the ceiling as performance continues to decline.
- Response time splitting
- Build an analysis decision tree
- It is to sort out the framework, the system, the problem, the process of searching the evidence chain, and the way of analysis. It plays a view of the overall situation, from a strategically positioned position of guidance.
- Comparison of scenes
- When you feel that a link in the system is not working, but you can not analyze it, you can directly add that link.
Parameterized data
- Parameterized logic
- Analyzing service Scenarios
- List the data to be parameterized and their corresponding relationships
- Retrieve parameterized data from the database or design corresponding generation rules
- Keep parameterized data in separate files
- Set the corresponding parameter combination relation in the pressure tool in order to realize the simulation of the real scene
Performance scenarios: Things to consider before doing parameterization
- Data to watch
- Parametric data, monitoring data and foundation data
Parameterized data
- Possible scenarios for parameterized data
- Data imbalance
- Insufficient amount of parameterized data
The question of parameterized data
-
How much data should be used for parameterized data?
-
Where does parametric data come from?
- Parameterized data fall into two main categories
- The data entered by the user already exists in the background database, such as the user data in our example above.
- Data features: stored in the background database; Requires active user input; The data entered by the user is compared with the data in the background database.
- Such data must be queried in the database before being parameterized into the tool.
- The data entered by the user does not exist in the background database. In a business flow, this data is inserted or updated into the data store.
- Data characteristics: The data does not exist in the database originally; The data will be inserted or updated into the database after successful script execution; Each user may enter the same or different data, depending on the business characteristics.
- Such data must be parameterized by pressure tools and must also meet business rules.
- Data meet the conditions: to meet the distribution of data in the production environment; The data volume must meet the requirements of performance scenarios.
- The data entered by the user already exists in the background database, such as the user data in our example above.
- Parameterized data fall into two main categories
-
How does the choice of more or less parameters affect the system pressure?
- Too many parameters, the pressure on the system will be large; If too few parameters are obtained and the data amount does not conform to the real scene, the real pressure of the system cannot be tested.
-
Is the histogram of parameterized data balanced in the database?
- This refers to whether the data distribution of each user matches the business scenario; For example, it is obviously unreasonable to create hundreds of thousands of data for user A and several pieces of data for user B
Performance Scenario Design
The early stage of the work
- Determine the services to be compassed and the proportion of these services (obtained from logs).
- Identify business objective TPS
- Determine the business goal response time
Baseline Performance Scenario
- purpose
- To test the maximum capacity of a single service, determine which service has the greatest impact on the overall capacity in a mixed capacity scenario.
Capacity Performance Scenario
- The main points of
- Scene constantly
- Control the proportion
- Capacity TPS calculation method
- Add the TPS of each service
Stability performance scenario
- The main points of
- Stability generally emphasizes that the system runs stably for a period of time. For example, 2000W traffic is required to run safely online for a week
- Minimum test duration = 2000w/capacity TPS (this value depends on how the capacity performance scenario is calculated)
Abnormal Performance Scenario
- The test method
- In general, it leaves services in limbo; For example, the main redis is down, to see whether the redis switchover will cause functional problems
Performance Monitoring design
Monitoring design steps
- Analyze system architecture; Monitor each component
- Monitoring should have levels and steps; First global, then directional quantitative analysis
- Analyze global, targeted, and hierarchical monitoring data, and then determine what information to collect next based on the results of the analysis, and then find the complete chain of evidence
Global monitoring design
OS layer
- Note the following parameters: CPU, I/O, Memory, Network, System, and Swap
CPU parameters | Parameter meaning |
---|---|
idle | The percentage of time the CPU is idle |
iowait | Percentage of CPU time occupied by I/O waiting |
irp | interrupt |
nice | Percentage of CPU time consumed by a running process |
softirp | softirqs |
steal | |
system | Percentage of CPU time consumed by system processes |
user | Percentage of CPU time consumed by user processes |
CPU queue |
IO/Disk parameters | Parameter meaning |
---|---|
TPS | The total number of I/ OS transmitted by physical devices per second |
rrqm/s | Number of merge read operations per second |
wrqm/s | Number of merge write operations per second |
r/s | Number of read I/ OS completed per second |
w/s | Number of write I/ OS completed per second |
bi | The total amount of data read from a block device, that is, read disks |
bo | The total amount of data written to block devices, that is, to disks |
r_await | Represents the average response time for reads |
w_await | Average response time of writes |
The Memory parameter | Parameter meaning |
---|---|
total | Total physical memory size |
free | Available memory |
used | Have used the memory |
Buff/cache | Total buffer memory |
available | Actual memory available |
The Network parameters | Parameter meaning |
---|---|
TX: sends traffic | |
RX: receives traffic | |
Send-Q/Recv-Q | Send queue, receive queue |
Full connection queue | |
Semi-connected queue |
The System parameters | Parameter meaning |
---|---|
interrupt | Represents the number of device outages observed per second over a time interval |
Context switch | Number of context switches per second |
Swap | Parameter meaning |
---|---|
total | Total exchange area |
free | |
used | |
si | Memory Size of the memory into the swap area |
so | The size of the memory swap area into memory |
The middleware
-
The message queue
- Indicators include: production rate and consumption rate
- Suppose you find rabitTMQ messages piling up, the solution
- Increase the number of consumers (this is because the rate of consumption is not keeping pace with the rate of production)
- If increasing the number of consumers does not solve the problem, it is possible that there is a bug on the server that prevents consumption
-
redis
-
mysql
Operating system common counters
- The command module
- CPU Parameter Meaning
us CPU
Is the percentage of CPU consumed by user-mode processeswa cpu
Is the percentage of CPU consumed by I/O read and write waits.sy CPU
Is the percentage of CPU consumed by the kernelsi CPU
Is the percentage of CPU consumed by soft interrupts