This article is the sixth part of “High performance Network programming Travel notes” “Performance Chase: a swastika 30+ Graph reveals 8 mainstream server program threading models”.

The photograph that takes recently is less, do not know to match what graph is good, drew oneself then, make do with, let everybody laugh at.

In this article, we will explore the network processing models of various mainstream application servers and see how they design network applications. In this article, I will from Node.js, Apache Server, Nginx, Netty, Redis, Tomcat, MySQL, Zuul and other commonly used Server procedures, to give you one by one analysis, analysis of the performance of various Server procedures, in order to have skills in hand, from now on the performance is a frequent guest.

Although it involves a lot of basic knowledge and principles of various frameworks, I will try my best to match straightforward pictures and texts for everyone to understand.

For more quality articles, let’s meet at IT House (itzhai.com) on our Java Architecture chat account.

First, let’s start with Javascript, which can write anything, and take a look at the inside story of node.js server concurrency.

1, Node. Js

In the last article, we looked at how JavaScript works on the browser side. Now, we’ll continue to look at how Node.js works and uncover the implementation mechanism behind its high performance.

1.1. Node.js Running mode

Here is a well-circulated Node.js system view from Totally Radical Richard1

Node.js is a single-threaded Event Loop2:

  • The V8 engine parses JS scripts and calls the Node API.
  • The Libuv library executes the Node API, first requesting the APIEncapsulate events, into the event queue,When the Event Loop thread is idle, it starts iterating through events in the Event queueTo handle the event:
    • If the task is not blocked, the result is processed directly and returned to V8 via the callback function.
    • If it is a blocking task, a thread is taken out from the Worker thread pool and handed over to the thread for processing. Finally, the thread sets the processing result to the result attribute of the Event, puts the Event back into the Event queue, and waits for the Event Loop thread to execute the callback and return it to the V8 engine.

The requested task will be encapsulated into the following structure: 3

varevent= createEvent({
  params:request.params, // Pass the request parameters
  result:null.// Store the request result
  callback:function(){} // Specify the callback function
});
Copy the code

How the single threaded non blocking IO model works in Node.js10

Of course, when a client requests to the Node.js server, there must be a process to create a connected socket, and then associate the connected socket descriptor with the specific execution code, so that after the asynchronous processing is complete, it will know which client to respond to.

1.2 node.js asynchronous case

The above operation mode explains or needs to combine the example to explain better understanding.

Without asynchronous processing via the callback function, we might write code like this:

var result = db.query("select * from t_user");
// do something with result here...
console.log("do something else...");
Copy the code

This code may be slow to perform the result query and wait for the result query before performing the console.log operation, which is performed on a single thread.

But that’s not how Node.js works. In Node.js mode, there is only one Event Loop thread, and if this thread is blocked, it will not be able to receive new requests. To avoid this, we rewrite the code as node.js callback:

db.query("select * from t_user".function(rows) {
  var result = rows;
  // do something with result here...
});
console.log("do something else...");
Copy the code

Node.js can now asynchronously process query requests and delegate query requests to Worker threads. After the Worker Thread gets query results, the result and callback anonymous function are packaged as events and published to the Event queue, waiting for the Event Loop Thread to execute the callback function. This way the console.log code can be executed immediately without being blocked by query requests.

1.3 advantages and disadvantages of node.js concurrency model

Node.js is event-driven and throws blocked I/O tasks into a thread pool for asynchronous processing, meaning node.js is suitable for I/O intensive tasks.

However, if a CPU-intensive task is encountered, the EventLoop thread in Node.js will process the task itself, which will cause the CPU-intensive task in the event queue to not be completed, and subsequent tasks will not be executed, resulting in a slow response to subsequent requests.

As shown in the following figure, socket2 and Socket3 can be processed quickly. However, because the SOCket1 task occupies CPU time, both Socket2 and Socket3 cannot be processed in a timely manner.

If the CPU is single-core, this is fine, but if the CPU is multi-core, this will cause other memory to be idle, resulting in a waste of resources.

Therefore, Node.js is not suitable for CPU-intensive tasks.

Node.js is suitable for scenarios where requests and responses are small and do not require a lot of computational logic, which can take full advantage of the Node.js running mode. A similar scenario has chat programs.

However, node.js can take full advantage of multi-core capabilities by providing the cluster and child_process APIS to create child processes, but multi-process means sacrificing shared memory and communication must be transmitted using JSON. 4

Starting with Node.js V10.5.0, worker_Threads were provided, giving Node.js multiple worker threads: Event Loop threads + self-started threads, which are useful for CPU-intensive JavaScript operations.

It is important to note that worker_Threads can be used as a solution to cpu-intensive problems in Node.js. IO and Node.js native threads (Event Loop threads) are already well supported without having to start a thread themselves to do this work (see the introduction of the first image in this section).

Well, the front-end knowledge can not talk about the deep, the deep talk on the flaw, after all, there are front-end leaders in the IT house Java architecture chat public number, secretly learn Java technology.

Next, we begin with the story of a feather.

2, the Apache

Apache was first released in 1995 and quickly took over the market, becoming the most popular Web server in the world. With the best language in the world – PHP to build a website, in those days can be said to beat the world.

Here we explore two working models used by the Apache Web server:

  • Apache MPM Prefork: for implementing the multi-process model;
  • Apache MPM Worker: used to implement multi-threaded model;

Apache uses the Multi Processing Module (MPM) to implement multi-process or multithreaded processors.

2.1、Apache MPM Prefork

In a word: Prefork is a non-threaded, pre-derived MPM.

Prefork generates multiple processes. Each process processes only one connection at a certain time, which is efficient but consumes a lot of memory.

This model is one process per request, with a parent process creating many child processes that wait for the request to arrive and process it, each request being processed by a separate process.

Note that each process uses system resources such as RAM and CPU, and uses the same amount of RAM.

If there are many requests at the same time, Apache will generate many child processes, which can lead to significant resource utilization.

2.2、Apache MPM Worker

In a word: Worker is an MPM that supports mixed multithreading and multiprocessing. The diagram below:

The child processes requests with an internal fixed number of threads specified by the parameter “ThreadsPerChild” in the configuration file.

The model generally uses multiple sub-processes, each of which has multiple threads, and each thread processes only one connection at a certain time, consuming less memory. This Apache model can accommodate a large number of requests with fewer system resources because a limited number of processes will service many requests.

Why can’t MPM workers be used in mod_PHP?

Due to some mod_PHP module problems, this module cannot be used with MPM Worker. PMP Worker is usually paired with Apache mod_fcgid, while PHP is run by phP-CGI.

As a Java siege lion, I won’t go into the details here, because the best language in the world is one that I’m sure you’ll want to learn.

Even with one thread per request, Apache is inefficient in high concurrency scenarios. Because if a request requires some data in the database, files on disk, etc., that thread will wait. As a result, some threads (Worker mode) or processes (Prefork mode) in Apache simply stop and wait for some task to complete, and these threads or processes eat system resources.

Let’s move on to a more efficient protagonist for concurrent scenarios: Nginx. Fundamentally, Apache is very different from Nginx. Nginx was created to solve the C10K problem in Apache.

Imagine a herd of pigs breaking out of a pigsty. Will Apache Server be able to withstand it? Maybe not, but Nginx will. This is the power of Nginx.

3, Nginx

Nginx is an open source Web server that, since its initial success as a Web server, now also serves as a reverse proxy, HTTP cache, and load balancer.

Nginx is designed to provide low memory usage and high concurrency. Instead of creating a new process for each Web request, Nginx uses an asynchronous, event-driven approach to process requests in a single thread.

Let’s talk about it in detail.

3.1 Nginx process model

3.1.1 Number of Nginx processes

After you start Nginx on a variety of operating systems, you will find several Nginx processes, as shown in the following figure:

There’s one master process and three workder processes.

Nginx has 3 worker processes. This is because we specified the number of worker processes in the configuration file:

worker_processes  3;
Copy the code

3.2. Process model

Nginx is a multi-process model. After Nginx is started, it runs in the background as a daemon. The background process consists of a Master process and multiple worker processes.

CM: Cache Manager, CL Cache Loader

Master process is mainly used to manage Worker process, mainly responsible for the following functions:

  • Receiving external signals;
  • Send signals to the Worker process;
  • Monitor the running status of Worker processes;
  • If the Workder process exits abnormally, a new Worker process is automatically created.

Worker processes are mainly used to deal with network events. We generally set the number of Worker processes as the number of CPU cores of the machine to make the most effective use of hardware resources. To do this, you can perform the following configuration:

worker_processes auto;
Copy the code

Through the use of shared cache to achieve child process caching, session persistence, traffic limiting, session logging, etc.

3.3. Working principle

Roughly speaking, the Master process performs the following steps:

socket();
bind();
listen();
fork();
Copy the code

Fork out several Worker processes, and the Worker performs the following steps:

accept();  / / accept_mutex lock
register IO handler;
epoll() or kqueue(a); handle_events(); .Copy the code

The accept_mutex lock ensures that only one Worker process is accepting at any one time. When the client connection arrives, only the process that successfully acquired the lock will execute Accept. 6

Scare problem 5: A program spawns N child processes that each call Accept and are put to sleep by the kernel. When the first client connection arrives, all N child processes are woken up because the listener descriptor used by all child processes points to the same socket structure. Although N child processes are awakened, only the child that runs first gets that client connection, and the rest of the n-1 child processes continue to sleep.

When the Nginx server is active, only Worker processes are busy, and each Worker process processes multiple connections in a non-blocking manner.

Each Worker process is single-threaded and runs independently. When the Worker process gets the connection, it will process it. The process can use shared memory to communicate and share cache data, session persistence data and other shared resources.

Let’s focus on how Worker processes work.

3.3.1 Working principle of Worker process

Each Worker process runs on a non-blocking, event-driven Reactor model.

The rough processing flow of a client request on the server side is shown in the following figure:

We covered the single-threaded version of the Reactor model in detail in our last article:

The basic processing logic in Worker processes is shown in the figure above:

  • When a request arrives in the Workder process, Accept receives the new connection and registers the IO read and write events of the new connection with the synchronous event demultiplexer.

  • Execute dispatch and call the demultiplexer to block and wait for IO events;

  • Distribute events to specific handlers for processing;

The **compute link is used to perform upstream, and a new connection is established with Backend to exchange requests. Does the Worker process block? ** If you were designing a similar event-driven program, how would you handle this scenario?

Obviously, asynchronous operations are required to block processes by sending requests to backend. The relationship between the FD connection socket sent to Backend and the FD connection socket requested by the client can be maintained.

3.3.2 How to deal with heavy work?

Like Node.js, there will be some heavy lifting in Nginx. For example, blocking call is used in third-party modules, and sometimes the developers of the module are not aware of the disadvantages of blocking call. If it is executed directly in the Worker process, the whole event processing cycle will be blocked, and they have to wait for the completion of the operation before they can continue to process the subsequent processes. Obviously, this is not the desired effect.

To solve this problem, Nginx implemented thread pools in version 1.7.11.

Use thread pools to address performance issues with heavy workload or third-party blocking operations

The following operations can cause Nginx to block:

  • Lengthy processing that consumes a lot of CPU;
  • Block access to resources, such as hard disk resources, mutex, or system calls, or get data from a database synchronously
  • .

In this case, Nginx places tasks that require a long time to execute in a thread pool processing queue, and processes these tasks asynchronously through the thread pool:

By introducing thread pools, Nginx’s performance has improved to a new level by eliminating blocking of Worker processes. More importantly, third-party libraries that were previously incompatible with Nginx can be used relatively easily without affecting Nginx performance.

3.4. Update the configuration gracefully8

After updating the configuration of Nginx, we usually execute the following command:

nginx -s reload
Copy the code

This command checks the configuration on the disk and sends the SIGNUP signal to the main process.

When the main process receives the SIGNUP signal, it performs the following operations:

  • Reload the configuration and spawn a new set of worker processes that immediately begin accepting connections and processing traffic;
  • Indicates that the old worker process exits normally. The worker process stops accepting new connections. The old worker process is shut down after each current request is processed and exits once all connections are closed.

This reloading configuration process may result in a small increase in CPU memory usage, but the performance sacrifice is worth it.

3.5. Elegant upgrades

The binary upgrade process of Nginx also implements the effect of non-stop service.

The new Nginx main process runs in parallel with the original main process, they share listening sockets, both processes are active, their respective Worker processes are handling traffic, and the old Master and Worker processes can then be instructed to exit normally.

3.5. Advantages of Nginx

In the one-process-per-request, blocking connection approach, each connection requires significant additional resource overhead and leads to frequent context switches.

Nginx’s single-process model can consume as little memory as possible, with almost no additional overhead per connection, the number of Nginx processes can be set to the number of CPU cores, and relatively few context switches.

So the question is, when we write our own network programs, is there a framework that can help us improve network performance? Yes, that’s the famous Netty. Let’s talk about him.

4, Netty

4.1 Netty primary/secondary Reactor model

Netty is no exception, it is based on Reactor model design and development.

Netty adopts the master/slave Reactor model. The master Reactor only establishes the connection, obtains the connected socket, and forwards the I/O events of the connected socket to the slave Reactor thread for processing.

Let’s review the master-slave schema again, and refer to my blog ithai.com (Itzhai.com) or itRead for more details on updating the Web Programming paradigm: High performance server so | C10K, Event Loop, Reactor, Proactor9

Let’s start by talking about a few abstractions related to Reactor in Netty:

  • Selector: can be interpreted as a Reactor thread, which internally senses the occurrence of events through I/O multiplexing, and then transfers the events to a Channel for processing.
  • Channel: the object registered with a Selector that represents the event that the Selector listens for, such as socket read and write events;

Specifically, Netty abstracts the following model to realize Reactor master-slave model:

  • The Boss Group, or primary Reactor, serves listening sockets, where the NioEventLoop runs on the primary Reactor thread, where:
    • Selector is an IO demultiplexer that senses Accept events on listening sockets.
    • ServerSocketChannel corresponds to the listening socket, which is bound to the OP_ACCEPT connection event;
    • An Acceptor is the handler of a connected event. After receiving a connected event, the Acceptor processes it.
  • The Worker Group (slave Reactor) serves connected sockets. Multiple NIoEventloops can be opened in the Worker Group. Each NioEventLoop runs on a slave Reactor thread, where:
    • Selector is an IO demultiplexer, used to sense I/O read and write events of connected sockets.
    • SocketChannel corresponds to the connected socket. After the listening socket obtains the connected socket, it will be packaged as SocketChannel and registered in the Selector of NioEventLoop of Worker Group for listening.
    • Handler Is a processor of I/O read and write events. It is imported from the API and customized service processing logic. The final I/O events are processed by this Handler.

Netty handles Channel events based on the Pipeline Pipeline pattern, as can be seen from Netty’s usage API.

4.2 Netty Reactor+Worker thread pool model

To reduce the impact of the specific business logic on the Reactor, we can separate the business logic processing into a thread pool so that neither the listening socket event processing nor the connected socket event processing will be blocked by the business processor, as shown in the following figure. More details refer to my blog IT curtilage (itzhai.com) or work all the Java architecture gossip (itread) the article update network programming paradigm: a high performance server so | C10K, Event Loop, Reactor, Proactor9:

We can create a DefaultEventExecutorGroup thread pool to process business logic.

The general program framework is shown in the figure below:

// Declare a bossGroup as the primary Reactor, essentially a thread pool, each thread is an EventLoop
EventLoopGroup bossGroup = new NioEventLoopGroup();
// Declare a workerGroup as a slave Reactor, essentially a thread pool, each thread is an EventLoop
EventLoopGroup workerGroup = new NioEventLoopGroup();
// Build a business processing Group
DefaultEventExecutorGroup defaultEventExecutorGroup =
        new DefaultEventExecutorGroup(10.new ThreadFactory() {
            private AtomicInteger threadIndex = new AtomicInteger(0);
            @Override
            public Thread newThread(Runnable r) {
                return new Thread(r, "BusinessThread-" + this.threadIndex.incrementAndGet()); }});try {
    // Create a server startup class
    ServerBootstrap bootstrap = new ServerBootstrap();
    bootstrap.group(bossGroup, workerGroup)
            .channel(NioServerSocketChannel.class)
            ...
            .childHandler(new ChannelInitializer<SocketChannel>() {
                @Override
                protected void initChannel(SocketChannel ch) throws Exception {
                    // Add custom handlers to pipeline
                    ch.pipeline().addLast(defaultEventExecutorGroup, newBizHandler()); }}); ChannelFuture future = bootstrap.bind(port).sync(); future.channel().closeFuture().sync(); }finally {
    bossGroup.shutdownGracefully();
    workerGroup.shutdownGracefully();
}
Copy the code

Netty is based on NIO, while NIO in Java is supported by JDK1.4, internal is based on IO multiplexing implementation, the specific implementation idea is not detailed, the underlying is IO multiplexing technology, through Channel with the help of Buffer processing sensed IO events.

Why Netty with NIO?

NIO is just an IO class library that implements synchronous non-blocking IO, while Netty is a high-performance networking framework based on NIO implementation designed by Master-slave Reactor.

NIO class library API complex, need to deal with multi-threaded programming, write their own Reactor model, and client disconnection reconnection, half packet read and write, failure cache, network blocking and abnormal code flow and other problems are very difficult to deal with. Netty has made a good encapsulation of these problems encountered by NIO. The main advantages are as follows:

  • API is simple to use;
  • High encapsulation degree, powerful function, provide a variety of codec, to solve the PROBLEM of TCP unpacking sticky packet;
  • Based on the Reactor model, high performance, no need to achieve the Reactor;
  • Many commercial projects, many trials, active community…

5, Redis

I believe you have heard the sentence “Redis is single threaded” countless times, Redis is really single threaded, and how to support so much concurrency, and used in so many Internet applications?

In fact, the single thread of Redis refers to a Reactor architecture that takes full advantage of the non-blocking, IO multiplexing model by having a primary processing thread within Redis. However, in some cases, Redis may generate threads or child processes to perform heavy tasks.

Redis includes a powerful asynchronous event library called the AE event model to wrap IO reuse techniques for different operating systems, such as Epoll, KQueue, Select, etc.

5.1 Redis threading model

It’s the same Reactor model, but once again we step into different borders, so a new way of expression emerges.

Redis developed network event handler based on Reactor model, which is called file event handler. But it doesn’t matter what it’s called, the principle is the same. Here is the Redis thread model:

This diagram basically covers the main things that Redis processes do:

  • After client A sends A request to establish A connection and listens for the Server Socket to establish A connection, an AE_READABLE event is generated.
  • The event is multiplexed by IO, put into an event queue, and finally dispatched by the file event dispatcher to the connection reply handler for processing:
    • The connection reply handler processes the new connection, associating the AE_READABLE event of the FD1 socket with the command request handler.
  • Client A finally generates A connected socket FD1 on the client.
  • Client A sends A command request that generates an AE_READABLE event, which is multiplexed by IO, put into an event queue, and finally dispatched by the file event dispatcher to the command request handler for processing:
    • Command requests the processor to execute the client FD1 socket command operation, get the result, write the result to the socket’s reply buffer, and prepare the response for the client;
    • Also associate the AE_WRITABLE event of the FD1 socket with the command reply handler;
  • When the FD1 socket is ready to write, an AE_WRITABLE event is generated. This event is multiplexed by IO, put into an event queue, and finally dispatched by the event dispatcher to the command reply handler for processing:
    • The command reply processor sends the result output response to the client FD1 connected socket.
    • The AE_WRITABLE event of the FD1 socket is then disassociated from the command reply handler.

Roughly an interactive process is completed like this, isn’t it very simple?

The idea is the same, but the implementation is blooming. Like if someone think of my articles written well, will be supported, but support way is different, some people will steal away white piao, some people will point a praise, some people will point in the look, also some people will share actively, click Java architecture focus star gossip, access IT curtilage itzhai.com further reading.

There is no single best server implementation for this Reactor model, but for me, your likes, follows, comments, and retweets are the best support.

5.2. Why is Redis single thread so efficient?

The former has talked so much about the benefits of the Reactor model, we also have a bottom in mind, roughly summarize:

  • Redis is memory operation, so the processing speed is very fast, which is also related to Redis efficient data structure, but this article focuses on the network related, data structure does not expand;
  • The bottleneck of Redis is not in CPU, but in memory and network. Based on Reactor model, non-blocking I/O multiplexing is realized, which uses CPU as efficiently as possible and avoids unnecessary blocking.
  • Single-threading instead avoids the overhead of context switching.

One of the biggest concerns for developers is that single threading reduces the complexity of development, eliminates the need to deal with race conditions, and allows lockless programming for lazy Rehash, Lpush, and other thread-unsafe commands of Hash.

Hair can lose a few, no wonder the author of Redis told me that Redis is stronger, but I am not bald, but more handsome.

5.3 is Redis really single threaded?

One more question for you, is Redis really single-threaded? From the Reactor model, there is definitely a bottleneck for single-threaded applications. If you are not sure, go back to my ithouse (itzhai.com) or Java Architecture Talk articles.

For example, non-blocking deletion operations such as UNLINK, FLUSHALL ASYNC, and FLUSHDB ASYNC take a lot of time to process if the memory space to be freed is large. These operations will block the thread to be processed. In the single-threaded model, the entire Redis service is blocked.

To this end, Redis introduced multi-threading.

Redis 4.0 initially introduces multithreading

In Redis 4.0, Redis began to feature more threads. This version is limited to deleting objects in the background, including non-blocking deletion operations.

The UNLINK operation, which removes the key from the metadata, does not delete the data immediately. The actual deletion is performed asynchronously in a background thread.

Redis 6.0 really introduces multithreading

Although a single thread can support a large amount of concurrency based on the Reactor model, if there are too many I/O reads and writes, too many connected sockets to be processed, and too many commands to be executed, then the single thread is still the bottleneck. At this time, we need to introduce the Reactor model. Even the Reactor model +Worker thread pool.

Again, for those of you who have fallen behind on this topic, check out my blog IT House (itzhai.com) or itRead for updates on web programming paradigms: High performance server so | C10K, Event Loop, Reactor, Proactor9.

In Redis 6.0, if you want to enable multithreading, you can set:

IO threads-do-reads yes // By default, IO threads are only used for write operations. You can set this option to yes if you want to enable IO threads for read operations and protocol parsing, but the Redis team claims that this does not help muchCopy the code

However, in order to avoid the problem of thread concurrency safety, Redis still executes the command in a single thread order, and only uses multithreading in the network data reading and writing and protocol parsing stage.

To learn more about this feature, we can read the following description of the redis.conf12 configuration file. Here, the feature is named: THREADED I/O. Here are some instructions translated from the THREADED I/O.

THREADED I/O

Redis is mostly single-threaded, but some thread operations, such as UNLINK, performing slow I/O access, are performed on background threads.

But now, reading and writing to Redis client sockets can be handled in different I/O threads. Due to extremely slow write speeds, Redis users often use pipelining pipelining to speed up Redis performance per kernel and generate multiple instances to scale. Using I/O threads can easily double the Redis write speed.

IO threads are disabled by default. You are advised to enable them on machines with at least four cores and reserve at least one spare kernel. That said, if you have a 4-core CPU, try using 2 or 3 IO threads.

io-threads 4
Copy the code

Setting iO-Threads to 1 will only enable single-threading as is traditional.

Using more than 8 threads doesn’t help much, and it is recommended to use IO threads only when there is a real performance problem, otherwise there is no need to use them.

With the IO thread enabled, we only use the IO thread for write operations, threading the write(2) system call and transferring the client buffer to the socket. However, reading threads and protocol resolution can also be enabled using the following configuration instructions:

io-threads-do-reads yes

In general, thread reads are not very helpful.

Redis uses a Worker thread pool (Reactor + IO) that is similar to the single-thread version, but not the same as the single-thread Reactor + Worker thread pool that I mentioned earlier.

I/O read events are delivered to the I/O thread pool in a Reactor thread (Reactor thread) for reading in batches. After reading, all requests are executed in a unified manner, and then all responses are written to the socket at once, as shown in the following figure:

The waiting time in the queue is evenly allocated to each I/O thread. The I/O thread pool is only responsible for I/O data reading and parsing. The I/O thread pool makes full use of the CPU’s multi-core processing capability and improves THE I/O read and write speed. In order to avoid the problem of thread concurrency, Redis still executes commands in a single thread order, and only uses multithreading in network data reading and writing and protocol parsing.

Is Redis 6.0 really single-threaded?

It can be found that Redis is not simply the introduction of multithreading mechanism, but based on the premise of avoiding the introduction of concurrent operation complexity, reasonable improvement design implementation, so as to protect hair, avoid hair falling off too fast, programmers need to focus on this…

If you are really challenged about whether Redis is single-threaded, remember that even with Redis 6.0, the process of executing commands is still single-threaded.

6, Tomcat

How can a Java programmer not know Tomcat, and what is Tomcat’s threading model? Without looking further, we could have guessed that Tomcat would use the Reactor model to optimize network processing, but the optimization process evolved slowly following the evolution of technology.

6.1 Overall Framework of Tomcat13

Tomcat is an HTTP server and a Servlet container that executes Java servlets and converts JavaServer Pages (JSP) and JavaServerFaces (JSF) into Java Servlets.

Let’s start by looking at the overall architecture of the various components of Tomcat. Tomcat uses a layered and modular architecture, like a nesting doll, as shown below, which is also the hierarchical structure of the Tomcat server.xml configuration file:

Server is a top-level component that represents an instance of Tomcat, as shown in a configuration file:

<Server port="8005" shutdown="SHUTDOWN">.</Server>
Copy the code

A Server can contain multiple services, each with its own Container and Connector.

Note that port is the TCP/IP port number for which the server is waiting for the shutdown command. Set this parameter to -1 to disable disabling ports.

Connector and Container are the two main components of a Service.

The entire Tomcat life cycle is controlled by the Server.

6.1.1, Container

Container manages servlets and processes Request requests sent by connectors.

As you can see, the inside of the Container seems to have hidden secrets. Yes, I’ve hidden the inside of the interface Container. Inside the Container, we can see the structure like this:

  • The top layer of a Container is an Engine. A Service can have only one Engine to manage multiple sites.
  • Host indicates a site. An Engine can have multiple hosts.
  • Context represents one application, one Host can have multiple applications;
  • The Wrapper encapsulates servlets. Every application has many servlets, and this one is the most familiar.

6.1.2, Connector,

Connector is used to process requests, process Socket sockets, wrap raw network data into a Request object for Container processing, and encapsulate a Response object for Socket output.

As shown in the figure above, a Service can have multiple connectors, each implementing a different connection protocol and providing services through different ports.

Connector is a key module for network processing. The efficiency of this module directly determines the performance of Tomcat.

Next, let’s open up the Connector’s Pandora’s box and find out what’s hidden inside.

Without further ado, I’m going to go straight to the above image. There are not many blogs that are so readily attached with images, but the Java architecture chatter from IT folks (itzhai.com) is one of them. To get the most out of IT, here are the component diagrams of the traditional BIO running model:

As shown above, a Connector consists of three main components:

  • ProtocolHandlerThe Connector uses a ProtocolHandler to receive a request and parse it according to the protocol. Different protocols have different ProtocolHandler implementations:
    • Http11Protocol: Blocking IO implementation of HTTP/1.1 protocol processor, using traditional IO operations, each request will create a thread;
    • Http11NioProtocol: Synchronizes the HTTP/1.1 protocol processor implemented by non-blocking IO. Tomcat 8 uses this mode by default.
    • Http11Nio2Protocol: Asynchronous IO implementation of HTTP/1.1 protocol processor, Tomcat 8 began to support;
    • Http11AprProtocol: Apr (Apache Portable Runtime/Apache Portable Runtime), is a highly Portable library that is at the heart of Apache HTTP Server 2.x. APR has many uses, including access to advanced IO functions (such as SendFile, ePoll, and OpenSSL), operating system-level functions (generating random numbers, system state, etc.), and native process processing (shared memory, NT pipes, and Unix sockets).
  • AdapterThe Adapter finally hands the Request object to the Container process for specific processing.
  • Mapper: With Mapper, you can find the corresponding servlet by requesting the address;

The main components of ProtocolHandler are:

  • EndPoint: Directly responsible for docking Socket Socket API, processing Socket connection, reading and writing data to Socket, that is, responsible for processing TCP layer related work;
  • Processor: Encapsulates TCP data based on protocols. For example, if HTTP is used, THE I/O data corresponding to the Request received by the EndPoint is encapsulated as a Request object based on HTTP specifications.

But now that we know that the EndPoint is directly responsible for connecting to the socket Api, we know that the core network programming performance key is in the EndPoint component, where various IO programming paradigmscan be used to optimize network performance. The EndPoint has several abstractions:

  • Acceptor: for handling listening sockets, establishing connections, and listening for requests;
  • Handler: used to process received socket requests;
  • AsyncTimeout: used to detect timeout of asynchronous requests.

Since the EndPoint component is the key performance for network processing, let’s focus on the design here.

6.2 Performance analysis of Tomcat Connector

Let’s start with the traditional BIO thread model.

6.2.1 BIO thread model of Tomcat

In the BIO thread model, a new connected socket is acquired in the traditional multi-threaded way, and then thrown into the thread pool to be processed by a single thread. From reading IO data, processing business, to responding to IO data, all are processed in the same thread. As shown below, I have drawn only the relevant components:

As shown in the figure above, an Acceptor thread acquires a new connected socket and passes it directly to the Executor thread pool for processing.

This mode, limited by the number of threads that can be created, cannot support very large concurrency, and the more threads blocked by I/O, the more thread context switches will result, wasting system resources.

Next we look at the NIO threading model, which is based on the Reactor + Worker thread pool network programming model.

6.2.2 NIO threading model of Tomcat

The corresponding implementation class is Http11NioProtocol, which synchronizes the HTTP/1.1 protocol processor of the non-blocking IO implementation. Tomcat 8 uses this mode by default.

Here is a component architecture diagram of the model:

A Selector object is maintained in the Poller thread to implement NIO-based network event handling.

The general working principle is as follows:

  • Acceptor threadReceive a connected socket, one of which is obtained using the traditional serversocket.accept () methodSocketChannelObject, and encapsulate the object toNioChannelObject, further putNioChannelObject encapsulated asPollerEventObject, and push the PollerEvent object to the event queue, which wakes it upThe selector in the PollerIn order to have an opportunity to register a socket read event with a Selector in the Poller.org.apache.tomcat.util.net.NioEndpoint.Poller.addEvent(PollerEvent event));
  • PollerEvent threadExecute the run method to PollerEvent the channel of the connected socketSelectionKey.OP_READRead events are registered in Poller’s Selector Selector.org.apache.tomcat.util.net.NioEndpoint.Poller.run());
  • The Poller threadExecute Selector. Select () to sense THE IO read event. Once the IO read event is sensed, the socket is encapsulated into a SocketProcessor through NioEndPoint and handed to the Worker thread for processing.
  • The Worker thread executes SocketProcessor’s doRun method, which ultimately passes to HTTP1NioProcessor for further processing.

NIO based Tomcat avoids IO blocking, reduces thread overhead, and thread context switching overhead, enabling greater concurrency.

At the same time, Tomcat supports asynchronous IO network read and write, corresponding to the implementation class: Http11Nio2Protocol.

6.2.3 NIO2 thread model of Tomcat

Http11Nio2Protocol: asynchronous IO implementation HTTP/1.1 protocol processor, Tomcat 8 after support, java-based AIO API implementation of asynchronous IO.

In Windows, Nio2 is asynchronous IO based on IOCP, while in Linux, it is asynchronous IO based on epoll multiplexing simulation implemented in user space. But the programming interface is represented as asynchronous IO, which is easier to write.

The architecture diagram of related components is as follows:

The corresponding asynchronous I/O processing class is Nio2EndPoint, and the class that obtains the connected socket is Nio2Acceptor.

Because IO is asynchronous, there is no need for Poller classes in Nio. Both accept and IO read and write are changed to asynchronous processing. When IO operations can be done, the Java asynchronous IO framework calls the CompletionHandler class of the corresponding IO operation for subsequent processing.

The SocketProcessor implements the Runnable interface, where the run method is originally thrown to the Worker thread for processing, including IO reads and writes. But now the SocketProcessor doesn’t need the system call overhead of one more IO operation.

6.2.4 Tomcat APR thread model

APR (Apache Portable Runtime/Apache Portable Runtime) is a highly Portable library that is at the heart of Apache HTTP Server 2.x.

Using the APR library in Tomcat, which is actually using JNI in Tomcat to read files and transfer them over the network, can greatly improve Tomcat’s performance in handling static files. SSL processing performance can also be improved if HTTPS is enabled for the service.

7, MySQL,

We talked about MySQL storage architecture, but we didn’t talk about the threading model of MySQL, so I’m going to give you a free tutorial. Give me a thumbs up for that, or give me a thumbs up.

MySQL thread model is what, or to ask professionals to explain, more authoritative.

After listening to my good friend Geir Hoydalsvik14, who is the developer and maintainer of MySQL database, I will also summarize (although he may not know me).

7.1 MySQL thread model

Thread_handling15 is a thread_handling15 parameter that controls the MySQL connection thread.

  • no-threads: indicates that MySQL uses the main thread to process connection requests without creating additional threads.
  • one-thread-per-connection: indicates that MySQL creates a thread for each client connection request.
  • loaded-dynamicallyWhen the Thread Pool plugin is initialized, it is set up to handle connection requests through a thread pool.

MySQL does not use Reactor or Proactor to optimize network I/O efficiency.

Let’s take a look at how MySQL internally works under the traditional one request creates one thread model. Here’s how the thread model works:

  • Connection request: the client requests the MySQL server. By default, the TCP 3306 interface of the MySQL server receives the message. The incoming connection request is queued.
  • Receiver Thread:The receiving thread is responsible for processing the queued connection request. After receiving the request, a user thread is created and the subsequent logic is processed by the user thread.
    • Thread cache: If the receiving thread can be found in the thread cache, the thread can be reused, otherwise, a new thread can be created. If OS threads are expensive to create, thread caching can be a big help with connection speed. Today, creating OS threads is relatively inexpensive, so this optimization isn’t very helpful. If the number of connections is small, it makes sense to try to increase the thread cache so that threads can be reused as much as possible;
  • User thread: The user thread handles the connection phase and command phase. Connection phase 16: Connection protocol, THD assignment, function negotiation, and authentication. User credentials are also stored in THD. If there is no problem in connection phase, command phase 17 will be entered.
  • THD: Thread/Connection descriptor, for each client connection, we create a separate thread and provide THD data structures for that thread as thread/connection descriptors, one THD for each connection thread. THD is created when the connection is established and deleted when the connection is disconnected. THD is a large data structure used to track various information about execution status, and THD memory grows significantly during query execution. For memory planning, Geir Hoydalsvik recommends planning an average of 10MB of memory per connection per month.

7.2 Factors that limit the concurrency efficiency of MySQL

The main factors that limit the concurrency efficiency of MySQL are mutex, database lock, or IO.

  • The mutex: To protect shared internal data structures, a mutex is created to ensure that only one thread is working on the internal data structure at any one time, but the mutex causes other threads to queue. To reduce the concurrency impact of mutex,Protected resources can be decomposed into finer – grained resources using a lock-free algorithmTo ensure that different threads use different mutex, thus reducing contention for global resources;
  • The database lock: Database locks. In some ways, database locks are semantically related and harder to avoid at one time (InnoDB is better at avoiding locks because of its multi-version concurrency control). The database is roughly divided into:
    • Data locks caused by SQL DML, such as row locks, usually protect data being updated by one thread from being read or written by another thread.
    • Metadata locks caused by SQL DDL to protect database structures from concurrent incompatible updates. Performance has to be compromised to maintain more important database semantics
  • Disk and network IO: Because the MySQL database is stored on the disk, data pages are inevitably loaded from the disk during SQL execution. At this time, the thread enters the waiting state. Thread concurrency is limited by IO capacity.

Why didn’t MySQL use the Reactor pattern to optimize IO?

I think there are the following reasons for this problem: the architectural design of MySQL determines that in the process of searching data through index, data pages need to be continuously loaded, and the Reactor model will be used to increase the coding complexity. MySQL concurrency is dependent on user load. Database deadlock, full table scan caused by improper index, poor performance of SQL statements, such as SELECT *, limit A, b, large table associated query, etc., may affect MySQL concurrency. The bottleneck is not in threads, and in the worst case, the number of user connections may even be lower than the number of CPU cores. It is because of this, so it is necessary to master MySQL tuning ah, for business optimization of SQL statements, performance may be dozens of times worse.

It is not easy to write good SQL, index design should follow the business reasonable design, once online it is difficult to adjust not to say, all kinds of sub-database sub-table middleware also need to use, sub-database sub-table, but also caused the problem of distributed transactions need to be solved. In order to solve the problem of low concurrency in MySQL, we introduced the cache to prevent concurrency, but the cache and database data consistency problem again…

Lu Xun said: really do not consider the following other database?

InnoDB’s protection against concurrent traffic

Based on the MySQL performance issues mentioned above, InnoDB storage engine does some defense: InnoDB can use a variety of techniques to limit the number of concurrent threads while helping to minimize context switching between threads. When InnoDB receives a new request from a user session, if the number of threads executing at the same time exceeds the predefined limit, the new request will sleep for a short time and then try again. Requests that cannot be rescheduled after sleep are placed in a first in/first out queue and eventually processed. The thread waiting for the lock is not counted in the number of concurrent threads.

Parameters involved:

  • Innodb_thread_concurrency: Maximum number of concurrent threads in the InnDB storage engine. If the value is 0, there is no limit.

  • Innodb_thread_sleep_delay: When the maximum number of concurrent threads is exceeded, the requesting thread must wait for innodb_thread_sleep_delay milliseconds before it can try again.

  • Innodb_concurrency_tickets: Once the requesting thread enters InnoDB, it gets the innodb_concurrency_tickets pass, which represents the number of times the thread can enter InnoDB without checking.

This design allows a query request to complete as quickly as possible (for example, a join query operation may contain multiple InnoDB query requests) without incurring frequent InnoDB thread context switching overhead.

This article was first published in: Performance Chase: a swastika article 30+ Figure revealed 8 major server program threads model and public account Java architecture miscellany, without permission, shall not be reproduced.

8 Zuul.

In this section, let’s talk about Zuul’s performance.

Since it’s Netflix’s open source microservices gateway, let’s check out Netflix Tech Blog 18 for more details on Zuul’s performance.

Let’s take a look at the performance of Zuul 1. Below, from Zuul Technology blog 18

(This image is from Zuul Technology blog 18)

This is a multi-threaded system architecture. Zuul 1 is built on servlets and features blocking IO calls and multi-threading, with each connection request being processed by one thread. The IO operation is accomplished by fetching a thread from the thread pool to execute the IO. During the IO operation, the requesting thread is blocked.

The Hystrix fuse is designed to provide overload protection against service load surges as the number of active links and request threads increases due to backend delays or request retries due to errors.

To optimize the problem of Zuul 1 blocking IO calls, let’s take a look at Zuul 2’s architecture, as shown below, again from Zuul Technology blog 18:

(This image is from Zuul Technology blog 18)

Zuul 2 also uses an event loop internally. In asynchronous mode, there is usually one thread per CPU core that handles all requests and responses, which are processed through events and callbacks.

Because no new threads are created for each connection and only the cost of file descriptors and listeners, the cost of the connection is low. In the blocking model, the connection cost is to start a new thread and incur a lot of memory and system overhead.

In asynchronous mode, the added cost of connections and events in the queue is much lower than the cost of thread stacking. But if the back end can’t handle it, the response time will inevitably increase.

Are there advantages and disadvantages to the asynchronous model? ** Sure, a system that blocks calls is easy to debug, and the thread stack is an accurate snapshot of the request’s execution. Asynchrony is based on callbacks and is driven by event loops, in which case stack traces for event loops are meaningless. This makes it difficult to track requests.


That’s IT for this article. If you are not satisfied, you can collect IT house (itzhai.com) and follow the star Standard Java architecture.

By the way, don’t take it too seriously. The 30+ images in the title include a few emoticons, but it’s definitely over 10,000 words…

English: By the way, don’t be too serious. The 30+ pictures in the title contain several emoticons, But the number of words must be more than 10,000…

Guest: 30+ pictures may be cars and cannons, and more than 10,000 words will really have a shadow

Yue: I don’t want to be too serious. The 30+ pictures in the title include several emoticon pictures, but the number of words must have exceeded 10,000…

Java: Java. Lang. OutOfMemoryError

Overall, the current mainstream server procedures, to achieve high concurrency routines are those, we have a few articles in front of the underlying basic knowledge to speak through, also do not understand the friends, you can go to see.

High Performance Network Programming Travel Notes

  • High performance network programming travel notes start miscellaneous
  • Network programming knowledge: diagram Socket core insider and five IO models
  • A quick look at signal-driven IO, which doesn’t seem so perfect
  • Thoroughly understand IO reuse: IO processing killer mace, with your in-depth understanding of SELECT, poll, epoll
  • Asynchronous I/O: the new I/O processing tool
  • Network programming paradigm: high-performance server so | C10K, Event Loop, Reactor, Proactor
  • Performance Chase: A swastika 30+ figure reveals the eight dominant server application threading models

The most important thing is that these contents are completely free in the IT nerds Java architecture chat public account! Don’t forget to give it a thumbs up

References


This article is published simultaneously on my blog IT House (itzhai.com) and my public account (Java Architecture Miscellany).

Author: arthinking | public number: Java architecture gossip

Blog links: www.itzhai.com/articles/de…

Copyright notice: copyright belongs to the author, shall not be reproduced without permission, infringement will be investigated! Please add the public number to contact the author.


  1. NodeJS System diagram. Retrieved https://twitter.com/TotesRadRichard/status/494959181871316992↩
  2. JavaScript running mechanism a: talk about the Event Loop. Retrieved from http://www.ruanyifeng.com/blog/2014/10/event-loop.html↩
  3. Node.js event loop mechanism. Retrieved from https://www.cnblog…↩
  4. How the single threaded non blocking IO model works in Node.js. Retrieved from https://stackoverflow.com/questions/14795145/how-the-single-threaded-non-blocking-io-model-works-in-node-js↩
  5. Understand multithreading in Node.js. Retrieved from https://zhuanlan.zhihu.com/p/74879045↩
  6. UNIX Network Programming Volume 1: Socket Networking API(3rd Ed.). Posts and Telecommunications Press. P659 ↩
  7. UNIX Network Programming Volume 1: Socket Networking API(3rd Ed.). Posts and Telecommunications Press. P657 ↩
  8. Inside NGINX: How We Designed for Performance & Scale. Retrieved from https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/↩
  9. Network programming paradigm: High performance server so | C10K, Event Loop, Reactor, Proactor. Retrieved from https://www.itzhai.com/articles/high-performance-network-programming-paradigm.html↩
  10. RedisConf17 – Redis Community Updates – Salvatore Sanfilippo. Retrieved from https://www.youtube.com/watch?v=U7J33pd3hLU↩
  11. Redis. Conf. Retrieved from https://github.com…↩
  12. Liu Guangrui. Tomcat architecture analysis. Posts and Telecommunications Press. ↩
  13. MySQL Connection Handling and Scaling. Retrieved from https://mysqlserverteam.com/mysql-connection-handling-and-scaling/↩
  14. Dev.mysql.com/doc/refman/…↩
  15. Connection phase. Retrieved from https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html↩
  16. Command stage. Retrieved from https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_command_phase.html↩
  17. Zuul 2 : The Netflix Journey to Asynchronous, Non-Blocking Systems. Retrieved from https://netflixtechblog.com/zuul-2-the-netflix-journey-to-asynchronous-non-blocking-systems-45947377fb5c↩