Latency versus throughput in concurrent processing | Java development practice

This is the second day of my participation in Gwen Challenge

This article is participating in “Java Theme Month – Java Development in Action”, see the activity link for details

The phenomenon of

Can high latency throughput also be high (distributed systems) and low latency throughput be low (GC and JVM)?

What is throughput and how is it measured

Generally speaking, system throughput refers to the system’s stress resistance and load capacity, which represents the maximum number of user visits a system can withstand per second. The throughput of a system is usually determined by QPS (TPS) and the number of concurrent transactions. Each system has a relative limit value for these two values. Once one of these values reaches the maximum value, the throughput of the system will not increase.

Concurrency, on the other hand, can be understood as the number of requests/transactions that the system can process simultaneously.

Calculation method: QPS= Number of concurrent requests /RT or QPS x RT

Let’s assume that every employee in the company needs to go to the bathroom within an hour from 9 am to 10 am every day. There are 3,600 employees in the company, and the average time for each employee to go to the bathroom is 10 minutes. Let’s calculate.

QPS = 3600/60*60 1

RT is 10 times 60, 600 seconds

Number of concurrent requests = 1 x 600 600

This means that to get the best possible experience, the company needs 600 potholes to accommodate staff, otherwise there will be queues for the toilet.

1. For distributed systems, about the impact of microservices on performance

What is the relationship between latency and throughputs? Is latency response time?

Let’s start with latency and response time. Latency is for the service itself, and response time is for the caller (see Design for Data-Intensive Applications for more information) :

Latency = Time at which a request response comes in or out of the system

ResponseTime = time between the start of a client request and receipt of a response = delay + network time

Ideally, the lower the latency, the higher the throughput. Of course, this is true for single-threaded machines. This is not true for distributed machines.

For example, from Miyun reservoir, pull a pipe to Guomao, water flow to Guomao, it takes 1 hour; If another pipe is pulled to Shunyi, it will take 20 minutes. If you receive water from a tap in Guomao, you can receive a lot of water per unit of time. This amount has nothing to do with whether you are in Guomao or not, but only with the amount of water/pressure input per unit of time in the reservoir. But if you put a small ball in the water pipe, it takes three times as long to get from Miyun to Guomao as it does to Shunyi. In this way, for the water pipe system to Guomao, the delay is very high, but the throughput of the system is the same as that to Shunyi.

Similarly, if a single unit system is split into 10 services, if a business process passes through 5 services, as long as the throughput of these 5 services is not lower than that of the original unit (TPS/QPS), then the throughput of the entire microservice system is unchanged. On the contrary, we use smaller services, simpler relationships, simpler databases, smaller transactions, and so on. If the throughput of all five systems is higher than the original system, then the overall throughput of the reformed system is also higher than before.

So what are the side effects of this process? Simple says, is getting high latency, originally is a local call, now turned into a remote call 5 times, assuming that each call to network latency between 1-10 milliseconds (physical room + Wan Zhao card can be very low, a cloud environment is higher), then the delay will increase than before the 5-50 milliseconds, but premise is distributed under the request, With asynchronous non-blocking streaming or message processing, synchronous blocking is higher and affects throughput.

Fortunately, systems with low latency requirements are rare, and the ability to scale horizontally is more important for the average business system than an increase in latency of a few milliseconds. For example, when we buy clothes on Taobao or JINGdong, the transaction steps are acceptable at the second level. For air tickets, hotels and movie tickets, the transaction steps are acceptable at the minute level.

Another practical example is that a company has been doing micro-service transformation since 2016. The r&d team is not large, the business is growing rapidly, and the infrastructure has not kept up with it, including automated testing and deployment. At the same time, the company’s main core business is a transaction system with low latency and high concurrency. The separation of micro-services leads to further increase of system delay and decrease of customer satisfaction. The team soon realized that breaking up into smaller systems was more difficult to maintain than individual systems, and took steps to consolidate some of the microservices to improve maintainability and control latency. For details, see:  Microservices Architecture In-depth Analysis and Best Practices – Part 5: Performance, Consistency, and High Availability of Seven Coping strategies.

2. For GC and JVM, about the performance impact of microservices

In other words, a portion of the garbage is processed at a time, and most of the garbage is processed while the business threads are not stopped, and only a few threads are garbage processing, so the business does not need to be suspended.

The number of objects created by the business system per unit of time is fixed. For low latency, there is no Parallel GC running at full power, and there is no Parallel GC running at full power. Naturally, there is no Parallel GC running at full power for more efficient garbage processing.

Take cleaning, for example, a floor with a lot of staff working on it now needs to be cleaned. 1, Serial GC: Everyone goes out, a cleaning lady comes to clean, after cleaning, everyone comes in to work. 2, Parallel GC: Everyone comes out, a large group of cleaning ladies to clean up, after cleaning, everyone comes in to work. 3. CMS: Each time 1/4 of the people go out, three or five aunts come in to clean, and then 3/4 of the staff are still working, so the business will not be suspended by GC most of the time (only a short pause is required for all the people when the garbage is counted and confirmed). But apparently, there are fewer aunts working, so overall processing efficiency actually decreases. 4. G1: On the basis of CMS, the workstation area of the whole floor is divided into many small pieces, one part of which is processed each time, so the management can be more refined and the estimation of each operation can be more accurate. There are also a small number of aunts to clean each time, so the efficiency is not higher than Parallel.

conclusion

When measuring web site performance, in terms of concurrency versus throughput, if you’re talking about network devices, see: Concurrency x packet length = throughput, which depends on concurrency and packet length. If you are talking about server and overall performance, you need to specify a measure of throughput. Let’s say throughput is measured in terms of QPS. If the number of concurrent requests is high but the average response time does not rise, then QPS is not necessarily high.

Latency versus throughput in concurrent processing | Java development practice

The phenomenon of

What is throughput and how is it measured

1. For distributed systems, about the impact of microservices on performance

2. For GC and JVM, about the performance impact of microservices

conclusion

Related Posts

Complete Spring Boot source analysis – @configurationProperties annotation implementation

MySQL JOIN query

“Spring-aop” source code analysis three: JDK dynamic proxy &Cglib