This is the 11th day of my participation in the August More Text Challenge

Recently see good article IO multiplexing, remember early learning, and to explore the select, poll, the difference between the epoll, but also is not timely record summary, also forgot, learning seems to be wandering in the memory and forget, in the heart of the fire, is put out or start a prairie fire on the game between the memory and forget

Socket and IO are brothers. Where there is socket, there must be IO. IO data also comes mostly from sockets

socket

Socket is the intermediate software abstraction layer of communication between application layer and TCP/IP protocol family. It is a group of interfaces. In the design mode, Socket is actually a facade mode, it hides the complex TCP/IP protocol family behind the Socket interface, for the user, a simple set of interfaces is all, let the Socket to organize data to conform to the specified protocol

In addition to the TCP protocol (three handshake, four wave) knowledge, and then the Java API corresponding methods

The three-way handshake is associated with two methods: listen() on the server and connect() on the client

Both methods have something in common: the TCP triple handshake is not done by themselves, the handshake is done by the kernel, they just tell the kernel to automatically complete the triple handshake connection

Differences: Connect () blocks, listen() is non-blocking

Details of the three-way handshake:

  • First handshake: The client sends a SYN packet and enters the SYN_SENT state to wait for confirmation from the server.
  • Second handshake: After receiving a SYN packet, the server needs to send an ACK packet to the client and a SYN packet to the client. Therefore, the server sends a SYN + ACK packet to the client. In this case, the server enters the SYN_RCVD state.
  • Third handshake: The client receives a SYN + ACK packet and sends an acknowledgement packet to the server. The client enters the ESTABLISHED state. After receiving the ACK packet sent by the client, the server enters the ESTABLISHED state and completes the three-way handshake

io

IO is often heard is synchronous blocking IO, synchronous non-blocking IO, asynchronous non-blocking IO; That is synchronous, asynchronous, blocking, non-blocking four words combined, but the name is not correct, since synchronous, should be blocking, how can there be synchronous non-blocking? I do not know which sage’s learning summary is spread far and wide

Others confuse non-blocking IO with NIO

For the IO model, the most orthodox comes from Richard Stevens’ “UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking “, Section 6.2 “I/O Models”

  • Blocking I/O
  • Non-Blocking I/O
  • I/O Multiplexing
  • Asynchronous I/O

Before understanding these four common models, let’s briefly talk about the mechanism of Linux, so as to understand IO more easily. In “Out of heap Memory”, we mention the IO process of Linux and zero-copy technology, which is a more in-depth knowledge of IO model

An IO operation initiated by an application actually consists of two phases:

  • 1.IO call phase: the application process makes a system call to the kernel
  • 2.IO execution phase: The kernel performs I/O operations and returns
    • 2.1. Data Preparation phase: The kernel waits for the I/O device to prepare data
    • 2.2. Copy data phase: Copy data from the kernel buffer to the user-space buffer

For blocking and non-blocking, switching between user processes/threads and the kernel; User processes have to hang when kernel data is not ready

For synchronous and asynchronous, the key is whether the results are returned together. IO refers to read and send and whether the results are received in time

Roughly analysis, synchronous asynchronous, blocking non-blocking pair of combination is actually the macro and micro interspersed, from the point of view of the program to obtain the result is synchronous or asynchronous, and IO internal subdivision of blocking and non-blocking

I/O operations have two phases: 1, wait for data to be ready (read to kernel cache), 2, read data from kernel to user space (process space). In general, 1 takes much longer than 2. Block on 1 and block on 2 is synchronous block IO; 1 on non-blocking 2 on synchronous non-blocking IO, NIO, Reactor model; Non-blocking on 1 and non-blocking on 2 is asynchronous non-blocking IO, AIO, Proactor is the model.

Blocking IO

Because the user mode is blocked, waiting for kernel data to complete, the results need to be synchronized

Synchronizing non-blocking IO

User mode and the kernel are no longer blocking, but polling constantly for results, wasting CPU, which is not as pleasant as BIO

IO multiplexing

Reactor pattern, which stands for event response, an operating system callback/notification can be interpreted as an event, and the process/thread reacts to that event when it occurs. The Reactor pattern, also known as the Dispatcher pattern, monitors events for I/O multiplexing and dispatches them to a process or thread

For IO multiplexing, the additional details are an optimization process, select,poll,epoll

AIO

Proactor model,Reactor can be understood as “I will inform you of the coming event, and you will deal with it”, while Proactor is “I will deal with the coming event, and I will inform you after the process”. Here “I” refers to the operating system, and “you” is the user process/thread

Comparison of the four models

For the optimization process of IO model, one is the support of operating system, reduce system call, switch between user mode and kernel; The second is the transformation of mechanism from imperative to responsive


High-performance Architecture

It’s boring to just brush up on Socket/IO, so let’s take it to the next level and talk about it from an architectural perspective

From the general service processing business process: request -> process -> Response

From an architect’s perspective, of course, special attention needs to be paid to the design of high-performance architectures. High-performance architecture design focuses on two aspects:

  1. Try to improve the performance of a single server and maximize the performance of a single server.
  2. If a single server cannot support performance, design a server cluster.

In addition to the above two points, the final system can achieve high performance, and specific implementation and coding related. However, architecture design is the foundation of high performance. If architecture design fails to achieve high performance, the space for improvement of concrete implementation and coding is limited. Figuratively speaking, architectural design determines the upper limit of system performance, and implementation details determine the lower limit of system performance.

One of the keys to the high performance of a single server is the concurrency model adopted by the server. The concurrency model has two key design points:

  • How does the server manage connections
  • How does the server handle requests

Both of these design points ultimately relate to the OPERATING system’s I/O model and process model.

  • I/O model: blocking, non-blocking, synchronous, asynchronous
  • Process model: single process, multi-process, multi-thread

Traditional mode PPC&TPC

PPC, or Process Per Connection, creates a Process for each Connection to Process. This mode is simple to implement and is suitable for scenarios with few server connections, such as database servers

TPC, or Thread Per Connection, creates a Thread for each Connection to process. Thread creation costs less and communication between threads is simple

Both of these are traditional concurrent patterns used in constant-connection scenarios such as databases (constant-connection mass requests) and internal enterprises (constant-connection constant requests).

As for whether it is a process or a thread, most of it depends on the language features. Java language uses threads, such as Netty, because the JVM is a process and threads are easily managed. C language processes and threads can be used, such as Nginx processes, Memcached threads.

There are three metrics to consider when choosing a different Concurrency model: Response time (RT), Concurrency (Concurrency), and Throughput (TPS). Throughput = number of concurrent requests/average response time. Different types of systems have different requirements for these three indicators.

Three – high systems, such as seckill and instant messaging, cannot be used

Three low system, such as ToB system, operation class, management class system, generally can be used

High throughput system, if it is memory computing based, generally can use, if it is network IO based, generally can not use.

Reactor&Proactor

For the traditional way, the display can only be suitable for constant connection constant request, can not adapt to the Internet scenario

For example, massive connections and requests under the double Eleven scene; Massive connection constant requests of portal sites;

The introduction of thread pools is also a means, but it is not a fundamental solution. For example, in the middleware scenario of constant connection of massive requests, threads, though light, have to consume resources and eventually have an upper limit

Reactor stands for event response. An operating system callback/notification can be understood as an event that a process/thread reacts to when it occurs. The Reactor pattern, also known as the Dispatcher pattern, monitors events for I/O multiplexing and dispatches them to a process or thread.

As you can see, I/O multiplexing is the core of Reactor. In essence, I/O operations are separated from specific business processes/threads for unified management. Select /epoll is used to manage I/O connections synchronously.

The core of Reactor pattern is divided into Reactor and processing resource pool. Reactor listens and assigns events, and the pool handles events

How about high performance? I/O multiplexing, with the process, thread combination, there is:

  • Single Reactor Single process/thread
  • Single Reactor multithreading
  • Multiple Reactor Single-process/thread (Compared with “single-reactor single-process”, this implementation is complex and has no performance advantages, so it is seldom used in practice)
  • Multiple Reactor Multiple processes/threads

Single Reactor Single thread

In this pattern, a Reactor, Acceptor, and Handler all run in a single thread

The advantage of a single-reactor single-process model is that it is very simple. There is no inter-process communication, no process competition, and everything is done within the same process.

But its disadvantages are also very obvious, specific performance is:

  • With only one process, the performance of a multi-core CPU cannot be exploited. The only way to take advantage of multi-core cpus is to deploy multiple systems, but this brings operational complexity. Instead of maintaining one system, you need to maintain multiple systems on one machine.
  • When a Handler processes services on a connection, the entire process cannot process events on other connections, causing performance bottlenecks

Therefore, the single-reactor single-process scheme has few application scenarios in practice, and only applies to the scenarios with very fast business processing. At present, the well-known open source software uses single-reactor single-process is Redis

In Redis, if the value is large, the QPS of Redis will decrease greatly, and sometimes a large key can drag down

Now after version 6.0, it has become a multi-threaded model, and the deletion performance for large values has been improved

Single Reactor multithreading

In this model, the Reactor and Acceptor run on the same thread, while the Handler runs on the same thread as the Reactor and Acceptor only for the read and write phases. Data processing between reads and writes is distributed to the Reactor thread pool

Single Reator multithreading scheme can make full use of multi-core and multi-CPU processing power, but it also has the following problems:

  • Multithreaded data sharing and access are complicated. For example, when a child thread completes a business process, it passes the result to the Reactor for the main thread for transmission, which involves mutual exclusion and protection mechanisms for shared data. In Java NIO, for example, Selector is thread-safe, but the collection of keys returned by Selector.selectKeys() is thread-safe, and processing of selected keys must be single-threaded or synchronized.
  • The Reactor monitors and responds to all events and runs only on the main thread, which can be a performance bottleneck at high concurrency

Multireactor multithreading

To solve the problem of multiple threads in a single Reactor, the most intuitive method is to change a single Reactor to multiple reactors

At present, the famous open source system Nginx adopts multi-reactor and multi-process, and the realization of multi-reactor and multi-thread includes Memcache and Netty

Use a 5W root cause analysis (also known as the 5Why analysis or Toyota five-question method, in which you ask “why” five times) to check how well you have learned this area of knowledge

Question 1: Why is the Netty network processing performance high?

A: Because Netty uses the Reactor model

Question 2: Why is the Reactor model used for high performance?

A: The Reactor pattern is an event-driven pattern based on IO multiplexing.

Question 3: Why is THE IO multiplexing performance high?

A: Because IO multiplexing does not suspend worker threads without data, as blocking IO does, nor does it require polling for data, as non-blocking IO does.

Question 4: Why does IO multiplexing require neither suspending worker threads nor polling?

A: Because IO multiplexing allows you to monitor many connections in one monitor thread, you just suspend the monitor thread when there is no IO operation; Whenever there is a connection that can perform IO operations, the operating system will call up the monitoring thread for processing.

Q5: that still suspends the monitor thread, why is this performance high?

A: First of all, if you take the way of blocking the worker thread, for such a Web system, the concurrent connection may be tens of thousands of tens of thousands, if each connection opened a thread, the system performance can not support; With a thread pool, there is the problem of waiting for threads because threads are blocked and cannot be used to process other connections. Secondly, the number of working threads in a single system online can reach hundreds or thousands, so frequent switching of such a number of threads will cause performance problems, while the performance impact of switching of a single monitoring thread can be ignored. Third, the worker thread can do other things when there is no IO operation, which can greatly improve the overall performance of the system.

Reference

Five IO models are thoroughly analyzed

IO model

Scalable IO in Java

Learning Architecture from Scratch