A high performance program uses CPU, memory, network, and disk resources efficiently to process a large number of requests in a short period of time. So how do you measure “short and a lot”? There are two key metrics: response time and transaction processing per second (TPS).
So what is efficient use of resources? I think there are two principles:
- Reduce waste of resources. For example, try to avoid thread blocking, because a blocking will cause a thread context switch, which requires CPU resources; For network communication, data is copied from kernel space to Java heap memory, which needs to be transferred through local memory.
- When a resource becomes a bottleneck, use another resource in exchange. Caching and object pooling technologies, for example, trade memory for CPU; Data compression and retransmission is CPU for network.
Tomcat and Jetty use a lot of high performance, high concurrency designs, and I summarize a few: I/O and threading models, reduced system calls, pooling, zero copy, and efficient concurrent programming. I’ll describe these designs in detail below, and I hope you can apply these techniques to your actual work.
I/O and threading models
The essence of the I/O model is to mitigate the speed difference between the CPU and peripherals. When a thread makes an I/O request, such as reading or writing network data, and the nic data is not ready, the thread blocks, giving up the CPU, and a thread switch occurs. However, thread switching is useless, and when a thread is blocked, it holds memory resources and does not release them. The more blocked threads, the more memory is consumed. Therefore, the goal of the I/O model is to minimize thread blocking. Both Tomcat and Jetty have moved away from traditional synchronous blocking I/O in favor of non-blocking or asynchronous I/O so that business threads don’t have to block on I/O waits.
In addition to the I/O model, the threading model is also a key factor affecting performance and concurrency. The general handling principles of Tomcat and Jetty are:
- Connection requests are processed by a dedicated Acceptor thread group.
- I/O event detection is also handled by a dedicated group of Selector threads.
- The specific protocol parsing and business processing may be handed over to either a thread pool (Tomcat) or a Selector thread (Jetty).
The advantage of separating these things is decoupling, and you can set the number of threads for each part as appropriate. Note that the number of threads is not always better, because the number of CPU cores is limited and too many threads can’t handle it, resulting in a large number of thread context switches.
Reduce system calls
In fact, the system call is a very expensive process, which involves the CPU switching from user to kernel mode, so we should consciously avoid system calls when we write programs. For example, in Tomcat and Jetty, the most common system calls are network operations. Write on a Channel is a system call. To reduce the number of system calls, the most direct way is to flush the buffer until the output reaches a certain size. Both Tomcat and Jetty’s channels have input and output buffers.
It is also worth noting that Tomcat and Jetty both use a delayed parsing policy when parsing HTTP data, with the HTTP request Body being not parsed until it is used. That is, when Tomcat calls the Servlet’s service method, it only reads and parses the HTTP request header, not the HTTP request body.
Tomcat does not read and parse data in the HTTP request body until your Web application calls the getInputStream or getParameter method of the ServletRequest object. This means that if your application does not call the above two methods, the HTTP request body data will not be read and parsed, thus saving an I/O system call.
Pooling, zero copy
With regard to pooling and zero copy, the essence of pooling is to trade memory for CPU; And zero copy is not useless, reduce the waste of resources.
Efficient concurrent programming
We know that in order to synchronize multiple threads’ access to a shared variable in the process of concurrency, locks are required. The cost of the lock is relatively large, the process itself is a system call, if the lock does not get the thread will be blocked, and there will be a thread context switch, especially when a large number of threads compete for a lock at the same time, it will waste a lot of system resources. Therefore, as a programmer, make a conscious effort to avoid the use of locks, such as using atomic CAS classes or concurrent collections instead. If you have to use a lock, minimize the scope and strength of the lock. Let’s take a look at how Tomcat and Jetty do efficient concurrent programming.
Narrow down the lock
Narrowing the scope of locking means using fine-grained object locking instead of directly synchronized.
protected void startInternal() throws LifecycleException { setState(LifecycleState.STARTING); // Lock the engine member variable if (engine! = null) { synchronized (engine) { engine.start(); } // Synchronized (executors) {for (Executor Executor: executors) {executor.start();} // Synchronized (executors) {for (Executor Executor: executors) {executor.start(); } } mapperListener.start(); Lock connectors Member variable synchronized (connectorsLock) {for (Connector Connector: connectors) { // If it has already failed, don't try and start it if (connector.getState() ! = LifecycleState.FAILED) { connector.start(); }}}}Copy the code
For example, start Tomcat’s StandardService component by following the following steps: engine, executors, and connectors. Instead of locking the method directly, it uses three fine-grained locks for each of the three member variables. If synchronized is added directly to a method, multiple threads will queue up to execute the method; With synchronized at the object level, multiple threads can execute this method in parallel, only queuing when accessing a member variable.
Replace locks with atomic variables and CAS
The following code is the start method for the Jetty thread pool, whose main function is to start a corresponding number of threads based on the parameters passed in.
Private Boolean startThreads(int threadsToStart) {while (threadsToStart > 0 &&isrunning ()) { Int threads = _threadsstart.get (); if (threads >= _maxThreads) return false; // Use CAS to increment the number of threads by one. Note that execution failed. Continue if (! _threadsStarted.compareAndSet(threads, threads + 1)) continue; boolean started = false; try { Thread thread = newThread(_runnable); thread.setDaemon(isDaemon()); thread.setPriority(getThreadsPriority()); thread.setName(_name + "-" + thread.getId()); _threads.add(thread); //_threads concurrent set _lastShrink. Set (system.nanotime ()); //_lastShrink is the atomic variable thread.start(); started = true; --threadsToStart; } finally {// If the thread fails to start, the number of threads needs to be reduced by one if (! started) _threadsStarted.decrementAndGet(); } } return true; }Copy the code
You can see that the entire implementation of the function is a while loop and is unlocked. _threadsStarted indicates how many threads have been started in the current thread pool. It is an atomic variable called AtomicInteger, which is first retrieved by its get method and then returned if the number of threads has reached the maximum. Otherwise, try adding one to _threadsStarted with the CAS operation. If successful, this means that no other thread is changing the value, and the current thread can continue. Otherwise, go to the continue branch, which continues to retry until success. Of course you can use locks to do this, but our goal is to be lock-free.
Use of concurrent containers
CopyOnWriteArrayList is suitable for scenarios with more reads and less writes, such as Tomcat’s use of it to “store” event listeners. This is because listeners are generally determined during initialization, and the list of listeners needs to be iterated when an event is triggered, so this scenario conforms to the characteristics of more reads and less writes.
Public abstract class implements Lifecycle {private final List<LifecycleListener> lifecycleListeners = new CopyOnWriteArrayList<>(); . }Copy the code
Use of the volatile keyword
Take LifecycleBase in Tomcat, for example, where the health state is volatile. The purpose of volatile is to ensure that if one thread changes a variable, another thread can read the change. For the health state, the value needs to be kept up to date across threads, so the volatile modifier is used.
Public abstract class implements Lifecycle {// Implements Lifecycle; Private volatile LifecycleState state = lifecyclestate.new; }Copy the code
The essence of
High performance programs use system resources efficiently. First, they reduce resource waste. For example, they reduce thread blocking, which leads to idle resources and thread context switching.
Tomcat and Jetty use caching and lazy resolution to minimize system calls, as well as zero-copy technology to avoid redundant data copies.
Another implication of efficient use of resources is that in the process of system design, we often exchange one resource for another. For example, object pooling technology used in Tomcat and Jetty is to exchange memory for CPU, and data compression and transmission is to exchange CPU for network.
In addition, efficient concurrent programming is also very important. Although multithreading can improve concurrency, it also brings the cost of locking, so we should try to avoid using locks in the actual programming process, for example, atomic variables and CAS operations can be used instead of locks. If locking is unavoidable, minimize the scope and strength of locking, such as fine-grained object locking or low-strength read-write locking. The Tomcat and Jetty code is a good example of this philosophy.