Netty
The most popular NIO framework, provided by JBOSS, integrates FTP,SMTP, and HTTP protocols
- The API is simple
- Mature and stable
- Active community
- After large-scale verification (Internet, big data, online games, telecommunications) Elasticsearch, Hadoop sub-project Avro project, Ali open source framework Dubbo, use Netty
BIO
Advantages: simple model, simple coding Disadvantages: performance bottleneck, number of requests and number of threads N:N relationship Under high concurrency, high CPU thread context loss case: Before Tomcat 7, it was BIO, after 7, NIO Improvement: pseudo-NIO, using thread pool to process logic
IO model
Synchronous block: throw clothes -> wait for the washing machine to finish washing -> then hang clothes synchronous non-block: throw clothes -> do other things regularly to see if the clothes are finished washing -> Hang clothes after washing by yourself asynchronous non-block: throw clothes -> do other things no matter, the clothes will automatically hang, and notify you that they are done
Five I/O models
There are five I/O models: blocking I/O, non-blocking I/O, multiplexing I/O, signal-driven I/O, and asynchronous I/O. The first four are synchronous I/O and block when kernel data is copied into user space
Blocking IO
Non-blocking IO
IO multiplexing
Core: Can handle multiple connections at the same time, call system select and recvfrom functions each socket set to non-blocking is blocked by the select function block instead of socket blocking Performance is not necessarily better than multithreading + blocking IO for low connection count (one more select call)
Signal drive
Asynchronous I/o
The future-listener mechanism is used
I/O operations are divided into two steps:
- Initiate AN I/O request and wait for data to be prepared
- The actual IO operations, The difference between blocking and non-blocking IO depends on whether the I/O request is blocked. Synchronous AND asynchronous IO depend on whether the actual I/O read and write is blocked. The requestor process blocks. Asynchronous does not require active data reading and writing Synchronous IO and asynchronous IO are for user application and kernel interaction
IO multiplexing
I/O refers to network I/O, multiplexing refers to multiple TCP connections, and multiplexing refers to one or more threads. Simply put: using one or several threads to process multiple TCP connections, the biggest advantage is to reduce the system overhead, do not have to create too many thread processes, and do not have to maintain these thread processes
select
The connection of file descriptors writefds, readdFS, EXCEPTFDS30W will be blocked, such as data readable, writable, abnormal, or timeout return after the normal return of the SELECT function, through the fdSET array to find which handle events, to find the ready descriptor fd, Then carry out the corresponding IO operation, almost all platforms support, cross-platform support good disadvantages:
- Select scans all file descriptors in polling mode. As the number of FD file descriptors increases, the performance deteriorates.
- Each time slect () is called, the fd collection needs to be copied from user state to kernel state and traversed (message passing is kernel to user space).
- The biggest drawback is that there is a limit to how many FD can be opened by a single process, which defaults to 1024
poll
The basic process is similar to select. Multiple descriptors are also polling, and the process is processed according to the state of the descriptor. The same needs to copy the FD set from user state to kernel state and iterate. The difference is that poll does not have a maximum file descriptor limit (using linked lists to store FDS)
epoll
There is no descriptor limit, the user mode is copied to the kernel mode only once using the event notification, epoll_ctl register fd, once the FD is ready, the kernel uses the callback mechanism to activate the corresponding FD advantage:
- There is no fd limit, the maximum number of FDS supported is the maximum number of file handles of the operating system (65535), 1 GB of memory will support about 10W handles, support millions of connections, 16 GB of memory can handle
- High efficiency, using callback notification rather than polling method, does not decrease efficiency as the number of FDS increases
- Drawback of kernel and user-space MMAP implementation of the same block of memory via callback notification mechanism: programming model is more complex than select/poll Linux kernel core functions
- Epoll_create () when the system starts, it requests a file system from the Linux kernel, a B + tree, and returns an epoll object, also a FD
- Epoll_ctl () operates on the epoll object, adding and deleting the corresponding link fd and binding a callback function
- Epoll_wait () determines and completes the corresponding I/O operation: Poll :100W connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections, 1W active connections Epoll: no need to traverse FDS, no need to copy kernel space and user space data. If 95W of 100W connections are active, poll is similar to epoll
Java的i/o
- Before jdk1.4, the synchronous blocking model (BIO) was used for large services, usually C/C++, because the asynchronous IO (AIO) provided by the system can be operated directly.
- NIO supports non-blocking IO with jdk1.4 and NIO2.0 with AIO support for asynchronous IO for files and network sockets
Netty threading model and Reactor model
The Reactor design pattern (Reactor Design pattern) is an event-driven design pattern that separates and schedules requests from one or more customers in an event-driven application. In event-driven applications, multiple service requests are received synchronously and sequentially. Advantages of synchronous non-blocking IO:
- Fast response and will not block for a single synchronization, even though the reactor itself is synchronous
- Programming is relatively simple, avoid complex multithreading and synchronization problems, avoid multithreading, process switching overhead
- Scalability, which can make full use of CPU resources through the number of REACTOR instances:
- Relatively complex, not easy to debug
- The REACTOR pattern requires support from the bottom of the system. Such as selector support in Java, operating system select system call support
Reactor single-threaded model
- As the NIO server, it accepts TCP connections from the client, and as the NIO client, it initiates TCP connections to the server
- The server reads the request data and responds, and the client writes the request and reads the response: This scenario applies to small services. The code is simple, but not suitable for heavy loads and high concurrency. A NIO thread processes too many requests, is overloaded, and responds slowly, causing a large number of requests to time out and become unavailable in case the thread hangs
Reactor multithreaded model
An Acceptor thread, a group of NIO threads, typically uses its own thread pool and consists of a single task queue and multiple thread scenarios. This can be sufficient for most scenarios, but it can also have performance problems when acceptors need to perform responsible operations, such as authentication, which is time-consuming
Reactor master-slave threading model
Acceptors are no longer a single thread, but a group of NIO threads. IO threads are also a group of NIO threads, so there are two thread pools to handle the access and processing I/O scenarios: It meets most of the current scenarios and is also recommended by Netty to use the threading model BossGroup to process the connection WorkGroup to process the business
Netty uses NIO instead of AIO
On Linux, the underlying implementation of AIO still uses Epoll, which is the same as NIO, so there is no obvious advantage in performance. Netty’s overall architecture is reactor model, which adopts epoll mechanism, IO multiplexing, Netty is an asynchronous communication framework based on Java NIO class library. It features asynchronous non-blocking, event-driven, high performance, high reliability, and high customizability.
Echo service
Echo service, used for debugging and detection
Source analysis
EventLoop and EventLoopGroup
There are three elements of a high-performance RPC framework: IO model, data protocol (HTTP, Brotobuf/Thrift), thread model EventLoop is like a thread. An EventLoop can serve multiple channels, and a Channel has only one EventLoop. Multiple EventLoops can be created to optimize resource utilization, i.e. one Cahnnel and one connection for an EventLoopGroup. The EventLoopGroup is responsible for EventLoop NIO (single thread handling multiple Channels) BIO (one thread handling one channel) events: Accept, connect, read, writeEventLoopGroup default create threads is the number of CPU cores * 2
Bootstrap
- Group: set the thread model, Reactor thread model compared to EventLoopGroup 1) single thread
EventLoopGroup g = new NioEventLoopGroup(1);
ServerBootstrap strap = new ServerBootstrap();
strap.group(g)Copy the code
2) Multithreading 3) master/slave threads
channel
NioServerSocketChannelOioServerSocketChannelEpollServerSocketChannelKQueueServerSocketChannel
childHandler
Used for data processing in each channel
childOption
Applies to the accepted connection
option
Set some parameters in the TCP connection for each newly established channel
- ChannelOption.SO_BACKLOG
The maximum length of the wait queue to hold requests that have completed the three-way handshake
TCP connection for Linux server
Syn queue: semi-connected queue, flood attack (sending the first handshake packet with forged IP address), TCPmaxSyn_backlog (modify half-connection vi /etc/sysctl.conf)
Accept queue: full connection queue net.core. Somaxconn Maximum number of connections on the current machine
The system default somaxconn parameter is large enough that if the backlog is larger than somaxconn, the latter is preferred
- ChannelOption.TCP_NODELAY
Nagle allows only one unack packet to exist on the network. The default value is false, which requires high real-time performance and sends data immediately when it is available
(TCPsynackRetries = 0 Retries the half-connection. If no ACK packet is received, the half-connection does not retry. The default value is 5synRetries defaults to 5. If the CLIENT does not receive the SYN+ACK packet, the client tries five times.)
childOption
Applies to links that have been accepted
childHandler
Used for data processing in each channel
Channel
- Channel A connection Channel established between a client and a server
- ChannelHandler is responsible for the logical processing of a Channel
- ChannelPipeline is the ordered container that manages channelHandlers
A Channel contains a Channel pipeline, and all channelhandlers are added to the Channel pipeline in sequence. Channel Triggers the corresponding event when the status changes
Status:
- The channelRegistered Channel registers with an EventLoop and is bound to Selector
- A channelUnRegistered channel is created, but not registered with an EventLoop, that is, not bound to Selector
- ChannelActive becomes active, connected to the remote host, and can receive and send data
- ChannelInActive Channel Is inactive and is not connected to a remote host
ChannelHandler and ChannelPipeline
ChannelHandler Lifecycle: HandlerAdded: When The ChannelHandler is added to the ChannelPipeline call handlerRemoved: When the ChannelHandler is removed from the ChannelPipeline call exceptionCaught: Perform the throw exception There are two subinterfaces to the ChannelHandler normally called:ChannelInboundHandler (inbound) : Handle input data and Channel state type changes, Adapter ChannelInboundHandlerAdapter (adapter design pattern), commonly used SimpleChannelInboundHandlerChannelOutboundHandler (outbound) : process the output data, adapter Channel
ChannelHandler: can be a string of instances of ChannelHandler that intercept input and output events passing through a Channel. ChannelPileline implements an advanced form of interceptor that gives the user complete control over the handling of events and interactions between ChannelHandlers
ChannelHandlerContext
- ChannelHandlerContext is a bridge that connects ChannelHandler to ChannelPipeline and part of the channelHandlerContext method overlapped with Channel and ChannelPipleline, Like call the write method Channel, ChannelPipeline ChannelHandlerContext can call writing method, 2 people before will spread in the whole pipeline flow, The ChannelHandlerContext will only be propagated in subsequent handlers
- AbstractChannelHandlerContext two-way chain table structure, next/prev successor, precursor node
- DefaultChannelHandlerContext is implementation class, but most of them are the parent class is completed, the only method to realize some simple, main is to judge the type of Handler fire calls the next Handler, don’t fire don’t call
Handler Execution sequence
InboundHandler Executes in sequence, and OutboundHandler executes in reverse order
channel.pipeline().addLast(new OutboundHandler1());
channel.pipeline().addLast(new OutboundHandler2());
channel.pipeline().addLast(new InboundHandler1());
channel.pipeline().addLast(new InboundHandler2());Copy the code
Through between InboundHandler1 InboundHandler2 OutboundHandler2 OutboundHandler1InboundHandler1 The fireChannelRead() method calls InboundHandler through ctx.write(MSG), passing the message to outboundHandlerctx.write (MSG). Inbound needs to be placed at the end, and after outbound, Write (MSG), or pipline.write(MSG), so you don’t have to think about the (transmission mechanism) client: Receive a request before sending a request, inbound before outbound
ChannelFuture
All I/0 operations in NetTY are asynchronous, meaning that any I/0 call will return immediately, and ChannelFuture will provide information about the result or status of the I/0 operation not completed: When the I/0 operation begins, a new object is created. The new object is initially incomplete. It has neither succeeded nor been cancelled because the I/0 operation has not completed. Completed: When the I/0 operation completes, whether it succeeds, fails, or cancels, the Future is marked as completed, with specific information about the failure, such as why it failed, but note that even the failure and cancellation are completed states.Note: Do not call sync and await methods of Future objects in IO threads. Do not call sync and await methods in channelHandler
ChannelPromise
Inherit ChannelFuture to further extend it to set the outcome of IO operations
codec
Java serialization/deserialization, URL codec, Base64 codec
- Inability to cross languages
- The serialized stream is too large and the packet is too large
- Serialization and deserialization performance is poor
Other codec frameworks in the industry: PB, Thrift, Marshalling, Kyro
Netty codec:
- Decoder: Primarily responsible for handling inbound InboundHandler
- Encoder: mainly responsible for handling outbound OutBoundHandler Netty default Codec, also support custom Codec Encoder (Encoder),Decoder (Decoder),Codec (Codec)
Netty Decoder
Decoder corresponds to ChannelInboundHandler, which basically converts byte arrays into message object methods:
- Commonly used decode:
- DecodeLast: The last message decoder to be generated for the last few bytes of processing, when cahnnel is closed:
- ByteToMessageDecoder is used to convert bytes into messages, requiring a check to see if the buffer has enough bytes
- ReplayingDecoder inherits ByteToMessageDecoder and is slightly slower than ByteToMessageDecoder without checking whether the buffer has enough data
- MessageToMessageDecoder A common decoder used to decode one message to another (e.g., POJO to POJO).
- DelimiterBasedFrameDecoder: perform message separator decoder
- LineBasedFrameDecoder: decoder that ends with a newline character
- FixedLengthFrameDecoder: a fixed length decoder
- LengthFieldBasedFrameDecoder: message = header + body, based on the length of the decoding of universal decoder
- StringDecoder: a text decoder that converts the received message into a string, typically combined with the above, followed by a handler for the business
Netty coder Encoder
The Encoder corresponds to the ChannelOutboundHandler, which converts the message object into a byte array Encoder:
- MessageToByteEncoder converts the message into a byte array and invokes the write method. It checks whether the current encoder supports the message type to be sent first. If not, the message type will be passed through
- MessageToMessageEncoder encoder from one message to another
Netty combined Codec Codec
Advantages: in pairs, codec is done in a class disadvantages: coupling, poor scalability
- ByteToMessageCodec
- MessageToMessageCode
TCP packet sticking and unpacking
TCP unpacking: A complete packet may be split into multiple packets by TCP for sending. TCP sticky packet: Packets are encapsulated into one large packet and sent. Several packets sent by the client may be stuck in the same packet when received by the server. TCP uses the Nagle algorithm by default. TCP receives data in the cache, and applications read data slowly from the cache. UDP has no sticky packets, packet unpacking problems, and boundary protocols
TCP half packet read and write solution
Sender: Disable Nagle algorithm Receiver: TCP is unbounded data flow, and there is no mechanism to deal with sticky packet phenomenon, and the protocol itself cannot avoid sticky packet, the occurrence of half-packet read and write needs to be dealt with in the application layer to solve the half-packet read and write method:
- Set up a fixed-length message abcdefgh11abcdefgh11abcdefgh11 (10 characters)
- Set message boundaries (? Cutting) DFDSFDSFDF? dsfsdfdsf$dsfdsfsdf
- Using a protocol with a header, the header stores the start id of the message and the length of the message. Header + Body
Netty provides a solution for reading and writing TCP half-packets
- DelimiterBasedFrameDecoder: specify message separator decoder
- LineBasedFrameDecoder: decoder that ends with a newline character
- FixedLengthFrameDecoder: fixed length decoder
- LengthFieldBasedFrameDecoder: message = header + body, based on the length of the decoding of universal decoder
Actual combat half pack read and write
LineBasedFrameDecoder: decoder that ends with a newline character StringDecoder converts an object to a string
Custom delimiters solve TCP read and write problems
DelimiterBasedFrameDecodermaxLength: it means the line of greatest length, length of more than hasn’t detect custom delimiter, throw TooLongFrameExceptionfailFast: If true, TooLongFrameException is thrown immediately after maxLength is exceeded without further decoding, if false, until the full message has been decoded, Again thrown TooLongFrameExceptionstripDelimiter: after decoding the message whether to remove the separator delimiters, separator, ByteBuf type
The custom length LengthFieldBasedFrameDecoder half a pack, speaking, reading and writing device
MaxFrameLength Maximum Length of a packet lengthFieldOffset Specifies the offset of the lengthFieldLength field. LengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment {Header + Body} lengthAdjustment { Netty should minus the corresponding number initialBytesToStrip from first remove the number of bytes in the frame of decoding, after obtaining a full packet, ignore the specified digits in front of the length of the byte, applying the decoder to get is not taking the length of the packet
ByteBuf
Byte container,
- In JDK, native ByteBuffer reads and writes share a common index, which requires Flip() for each change and can easily cause waste after expansion
- Netty ByteBuf Uses different indexes for read and write operations, facilitating automatic capacity expansion
ByteBuf creation method with common schema
ByteBuf: creates a container for passing byte data:
- ByteBufAllocator Netty 4.x uses PooledByteBufAllocator by default to improve performance and minimize memory fragmentation unpooling: UnPooledByteBufAllocator returns a new instance each time
- Unpooled: provides a static method to create an Unpooled ByteBuf, which can create heap memory and direct memory buffers
ByteBuf usage mode:
- Advantages: The heap buffer is stored in the JVM’s heap space and can be quickly allocated and released. Disadvantages: The heap buffer is copied to the direct cache (off-heap memory) before each use.
- Advantages: The Direct buffer does not occupy the HEAP memory of the JVM and is stored outside the heap. Disadvantages: The allocation and release of memory is more complex than in the heap cache
- The composite buffer creates multiple different BytebuFs and puts them together, but only as a view
Options: heavy IO data reads and writes, direct cache, heap cache for business message codec
Netty design mode
Builder Builder pattern: ServerBootstrap Chain of Responsibility Design pattern: Pipeline event propagation factory pattern: Create Channel adapter pattern: HandlerAdapter
Netty single combat million
- Network IO model
- Linux file descriptors Single-process file descriptors (number of handles). Each process has a maximum file descriptor that limits the number of global file handles. There is also a default value, which varies from system to system
- How do I determine a unique TCP connection
TCP quad: source IP address, source port, destination IP address, destination port
Server port range (1024 to 65535)
65545
Optimization: - Sudo vim/etc/security/limits. The conf number change local fd, modified to restart, the ulimit -n to view the current user each process the maximum number of fd
root soft nofile 1000000
root hard nofile 1000000
* soft nofile 1000000
* hard nofile 1000000Copy the code
- Sudo vim /etc/sysctl.conf Changes the number of global fd’s
fs.file-max=1000000Copy the code
Sysctl -p Restart parameter cat /proc/sys/fs/file-max Query the number of global FD’s
- Effective restart reboot
-Xms5g -Xmx5g -XX:NewSize=3g -XX:MaxNewSize=3g
The data link
The number of concurrent resources under the same domain name is limited. You are advised to use different domain names for different resources: – Browser kernel Scheduling – – Local DNS Resolution – – Remote DNS resolution – – IP- – Routing Multi-layer Hop – – Destination Server – – Server kernel – – Application program
This article is published by OpenWrite!