Netty related

1. What are the differences between BIO, NIO and AIO?

BIO: One thread for each connection. When a client requests a connection, the server starts a thread to process it. Threads are expensive.
Pseudo asynchronous IO: Request connections are put into a thread pool, one-to-many, but threads are still a valuable resource.
NIO: One thread per request, but all connection requests sent by the client are registered with the multiplexer. The multiplexer polls the connection for I/O requests and starts a thread for processing.
AIO: a valid request to a thread. The CLIENT’s I/O request is completed by the OS and then notified the server application to start the thread for processing.
BIO is flow-oriented, NIO is buffer-oriented; The BIO streams are blocked. NIO is non-blocking; BIO’s Stream is one-way, while NIO’s channel is bidirectional.
NIO features: event-driven model, single-thread processing multi-task, non-blocking I/O, I/O read and write no longer blocks, but returns 0, block-based transmission than stream-based transmission more efficient, more advanced IO functions zero-copy, IO multiplexing greatly improves the scalability and practicality of Java network applications. Based on the Reactor thread model.
In the Reactor model, the event distributor waits for an event or a state that can be applied to an action. The event distributor passes the event to a pre-registered event handler or callback that does the actual reading and writing. For example, read is implemented in Reactor: register read ready event, corresponding event handler, event distributor waits for event and event arrival, activate distributor, distributor calls handler corresponding to event to complete actual read operation, process read data, register new event, and then return control.

2. What is NIO made of?

Buffer: Interacts with a Channel. Data is read from a Channel into the Buffer and written from the Buffer into a Channel
- Flip method: Reverse this buffer, give position to limit, and then set position to 0 to switch read/write mode
- Clear method: Clear this buffer, set position to 0, and give the value of Capacity to limit.
- Rewind method: rewind this buffer, setting position to 0
DirectByteBuffer reduces one copy from system space to user space. However, buffers are more expensive to create and destroy and are out of control, and memory pools are often used to improve performance. Direct buffers are primarily allocated to large, persistent buffers that are vulnerable to native I/O operations on the underlying system. For small and medium applications with small data volumes, heapBuffer can be used to be managed by the JVM.
Channel: indicates the bidirectional connection between the I/O source and the target. However, it cannot directly access data and can only interact with the Buffer. FileChannel’s read and write methods both cause data to be copied twice!
The Selector method allows a single thread to manage multiple channels, the open method creates the Selector, and the Register method registers the Channel with the multiplexer, listening for the types of events: read, write, connect, and Accept. A SelectionKey is generated after an event is registered: it represents the registration relationship between SelectableChannel and Selector. The wakeup method returns the first selection operation that has not yet been returned. The channel is closed and the registration is cancelled. Higher priority events (such as timer events) are triggered and you want to process them in a timely manner.
The implementation class of Selector in Linux is EPollSelectorImpl, which delegates to the EPollArrayWrapper implementation. The three native methods encapsulate epoll, while the EPollSelectorImpl. ImplRegister method, Events are registered with the epoll instance by calling epoll_ctl. The mapping between the registered file descriptor (FD) and the SelectionKey is also added to the fdToKey. This map maintains the mapping between the file descriptor and the SelectionKey.
The fdToKey can sometimes become very large because of the number of channels registered with the Selector (millions of connections); Expired or invalid channels are not closed in time. The fdToKey is always read serially, and reading is done in the SELECT method, which is non-thread-safe.
Pipe: one-way data connection between two threads. Data will be written to the sink channel and read from the source channel
Selector. Open () : opens a Selector; Serversocketchannel.open () : Create a Channel on the server; Bind () : binds to a port. Configure the non-blocking mode. Register () : registers channels and concerned events to Selector; Select () polls to get the events that are already in place

3.Net TY features?

A high-performance, asynchronous event-driven NIO framework that provides support for TCP, UDP, and file transfers
Using a more efficient socket layer, spikes in CPU usage caused by epoll empty polling are handled internally, avoiding the pitfalls of using NIO directly and simplifying NIO processing.
Using a variety of decoder/encoder support, TCP sticky packet/subcontracting for automatic processing
Can use accept/process thread pool, improve connection efficiency, simple support for reconnection, heartbeat detection
The number of IO threads, TCP parameters can be configured, TCP receive and send buffers use direct memory instead of heap memory, and ByteBuf is recycled through the memory pool
The GC frequency is reduced by applying for the release of objects that are no longer referenced in time through reference counters
Efficient Reactor threading model using single-thread serialization
Extensive use of Volitale, use of CAS and atomic classes, use of thread-safe classes, use of read-write locks

4.Net TY thread model?

Netty receives and processes user requests based on the Reactor model and multiplexer, and internally implements two thread pools, boss thread pool and Work thread pool. The boss thread pool is responsible for processing the accept event of the request. When receiving the accept event request, Encapsulate the corresponding socket into a NioSocketChannel and hand it to the work thread pool, which takes care of the requested read and write events and handles them by the corresponding Handler.
Single-threaded model: All I/O operations are done by one thread, meaning that multiplexing, event distribution, and processing are done by one Reactor thread. You need to receive a client connection request, initiate a connection to the server, and send/read a request or reply/response message. A NIO thread processing hundreds of links at the same time, performance cannot support, slow, if the thread into an infinite loop, the whole program is unavailable, not suitable for high load, large concurrency application scenarios.
Multithreaded model: there is a NIO thread (Acceptor) that only listens to the server and receives TCP connection requests from the client. NIO thread pools are responsible for the operations of network IO, that is, reading, decoding, encoding, and sending messages; One NIO thread can process N links at the same time, but one link corresponds to only one NIO thread to prevent concurrent operations. However, an Acceptor thread can suffer from performance problems when millions of clients are connected or security authentication is required.
Master-slave multithreading model: Acceptor threads are used to bind listener ports, receive client connections, remove socketChannels from the multiplexer of the Reactor thread from the main Reactor thread, and re-register them with the Sub thread for I/O operations. So that mainReactor is only responsible for access authentication, handshake and other operations;

5. Cause and solution of TCP packet sticking/unpacking?

TCP processes data in the form of streams. A complete packet may be divided into multiple packets and sent by TCP, or a small packet may be encapsulated into a large packet and sent.
TCP sticky packet/subcontract causes:
- If the bytes written by the application program are larger than the size of the socket sending buffer, packet unpacking will occur. If the data written by the application program is smaller than the size of the socket sending buffer, the network adapter will send the data written by the application for several times to the network, which will cause packet sticking.
- When the TCP packet length -TCP header length is greater than MSS, packet unpacking occurs
- The payload of an Ethernet frame is greater than the MTU (1500 bytes) for IP fragmentation.
The solution
- Message length: The FixedLengthFrameDecoder class
- Tail strengthen special characters segmentation: line separators: LineBasedFrameDecoder or custom delimiter: DelimiterBasedFrameDecoder
- The message into the message header and the message body: LengthFieldBasedFrameDecoder class. It can be divided into unpacking and sticking with a head, unpacking and sticking with a head before the length field, and unpacking and sticking with multiple extended heads.

6. What serialization protocols do you know?

Serialization (encoding) is to serialize an object into binary form (byte array), mainly used for network transmission, data persistence, etc. Deserialization (decoding) is the restoration of byte arrays read from the network, disk, etc., to the original object, mainly used for decoding the network transfer object, in order to complete the remote call.
The key factors affecting serialization performance are the size of serialization stream (network bandwidth consumption) and serialization performance (CPU resource consumption). Whether support cross-language (heterogeneous system docking and development language switch).
Java provides serialization by default: it cannot cross languages, the serialized stream is too large, and the serialization performance is poor
XML, advantages: human-machine readable, element or feature names can be specified. Disadvantages: Serialized data only contains the data itself and the structure of the class, excluding type identification and assembly information; Only public properties and fields can be serialized; Cannot serialize methods; The file is huge, the file format is complex, and the transmission takes up bandwidth. Application scenario: Stores data as a configuration file and converts data in real time.
JSON, a lightweight data exchange format, has the following advantages: high compatibility, simple data format, easy to read and write, small serialized data, good scalability, good compatibility, compared with XML, its protocol is relatively simple, fast parsing. Disadvantages: Data is less descriptive than XML, not suitable for ms performance requirements, and high overhead of extra space. Application scenario (alternative to XML) : cross-firewall access, high adjustability requirements, Ajax requests based on Web Browser, relatively small amount of data transfer, and relatively low real-time requirements (such as second level).
Fastjson, which uses a “assume ordered fast matching” algorithm. Advantages: Easy to use interface, currently the fastest JSON library in the Java language. Disadvantages: too much focus on speed, deviation from “standards” and functionality, poor code quality, incomplete documentation. Application scenario: Protocol interaction, Web output, and Android client
Thrift is not only a serialization protocol, but also an RPC framework. Advantages: small size after serialization, fast speed, supports multiple languages and rich data types, has strong compatibility for the addition and deletion of data fields, supports binary compression coding. Disadvantages: few users, unsafe and unreadable when accessed across firewalls, relatively difficult to debug code, cannot be used with other transport protocols (such as HTTP), cannot read and write data directly to the persistence layer, that is, not suitable for persistent serialization protocol. Application scenario: RPC solution for distributed systems
Avro, a subproject of Hadoop, addresses JSON’s verbosity and lack of IDL. Advantages: support rich data types, simple dynamic language combination, self-describing properties, improved data parsing speed, fast and compressible binary data form, remote procedure call RPC, support cross-programming language implementation. Disadvantages: Not intuitive for users accustomed to statically typed languages. Application scenario: Hive, Pig, and MapReduce persistent data formats in Hadoop.
For Protobuf, data structures are described in a. Proto file. Code generation tools can generate POJO objects corresponding to data structures and methods and properties related to Protobuf. Advantages: Small code stream after serialization, high performance, structured data storage format (SUCH as XML JSON), forward compatibility of protocols by identifying the sequence of fields, and easier management and maintenance of structured documents. Disadvantages: rely on tools to generate code, relatively few languages are supported, only Java, C++ and python are officially supported. Application scenario: RPC calls with high performance requirements, good cross-firewall access attributes, and persistence of objects at the application layer
other
- Protostuff is based on the Protobuf protocol, but does not need to configure the Proto file
- Jboss Marshaling can serialize Java classes directly, without the need for the real Java.io.Serializable interface
- Message Pack an efficient binary serialization format
- Hessian is a lightweight remoting onHTTP tool that uses the binary protocol
- Kryo is based on protobuf protocol, supports Java language only, requires Registration, and then serializes (Output), deserializes (Input).

7. How to select a serialization protocol?

The scenario
- For inter-company system calls, xmL-based SOAP is a worthwhile solution if the performance requirements are above 100ms for services.
- Ajax based on Web Browser, and communication between Mobile APP and server, JSON protocol is the first choice. JSON is also a good choice for scenarios where performance is not very high, where dynamic typed languages are dominant, or where data transfer loads are small.
- Using JSON or XML can greatly improve debugging efficiency and reduce system development costs in the scenario where the debugging environment is harsh.
- Protobuf, Thrift, and Avro have some competition in scenarios where performance and simplicity are extremely important.
- Protobuf and Avro are the first choice for persistence scenarios with T-level data. If persistent data is stored in a Hadoop subproject, Avro is a better choice.
- Protobuf is more suitable for statically typed language engineers for non-Hadoop persistence projects. Because Avro’s design philosophy favors dynamically typed languages, Avro is a better choice for dynamic language-dominated applications.
- Thrift is a good choice if you need to provide a complete RPC solution.
- Protobuf can be preferred if different transport layer protocols need to be supported after serialization, or if high performance scenarios require cross-firewall access.
The protobuf data types are bool, double, float, int32, int64, string, bytes, enum, and message. For a protobuf, the qualifier is required: the value must not be empty. For a optional field, the value can be assigned to a specific field or not. Only one value from the specified set of constants can be used as its value.
Basic rules for protobuf: each message must have at least one required field and zero or more Optional fields; A repeated field can contain zero or more data. Identification numbers within [1,15] will occupy one byte during encoding (commonly used), and identification numbers within [16,2047] will occupy two bytes. The identification numbers must not be repeated, and the message type can also be nested at any level, and the nested message type can be used instead of group.
The protobuf message upgrade principle: do not change the numeric identifier of any existing field; Existing required fields cannot be removed. Optional and repeated fields can be removed, but labels cannot be reused. The newly added field must be optional or repeated. Older programs cannot read or write fields of the new Required qualifier.
The compiler generates a. Java file for each message type, as well as a special Builder class that creates the message-class interface. Such as: UserProto. User. Builder Builder = UserProto. User. NewBuilder (); Builder. The build ();
The use of Netty: ProtobufVarint32FrameDecoder is used to deal with half a pack message decoding classes; ProtobufDecoder (UserProto. User. GetDefaultInstance ()). This is to create UserProto Java file decoding classes; ProtobufVarint32LengthFieldPrepender messages about the protobuf agreement on the head with a length of 32 plastic field, used to mark the length of the message class; ProtobufEncoder is the code class
Convert StringBuilder to type ByteBuf: copiedBuffer() method

8.Net TY zero copy implementation?

Netty receives and sends bytebuffers using DIRECT BUFFERS, which use out-of-heap DIRECT memory for Socket reading and writing without the need for secondary copy of byte BUFFERS. If there is one more memory copy of the heap, the JVM copies the heap Buffer to direct memory before writing it to the Socket. ByteBuffer is allocated by ChannelConfig, which creates ByteBufAllocator using Direct Buffer by default
The CompositeByteBuf class consolidates multiple ByteBuFs into a logical ByteBuf, avoiding the traditional in-memory copying of several small buffers into one large Buffer. The addComponents method combines header and body into a logical ByteBuf. The Two ByteBuFs are separate within the CompositeByteBuf, which is only logically integrated
The FileChannel. TranferTo method, which is wrapped in FileRegion, can be used to transfer files directly to the target Channel, avoiding the memory copy problem caused by the traditional cyclic write method.
With the wrap method, we can wrap byte[] arrays, ByteBuf, ByteBuffer, and so on into a Netty ByteBuf object, thus avoiding copy operations.
Selector BUG: If the Selector is polled for empty and there is no wakeup or new message processing, then empty polling occurs and CPU usage is 100%.
Netty’s solution is as follows: The Selector operation period is counted, and each empty select operation is counted. If N consecutive empty polling occurs within a certain period, the epoll dead-loop bug is triggered. If not, the SocketChannel is unregistered from the old Selector, re-registered with the new Selector, and the original Selector is closed.

What are the high performance of 9.Net TY?

Heartbeat: on the server: it periodically clears idle sessions Inactive (netty5). On the client: It checks whether the session is disconnected or restarted and network delay. The idleStateHandler class is used to check the session status
Serial lock-free design, that is, message processing is completed in the same thread as far as possible, without thread switching, so as to avoid multi-thread contention and synchronous locking. On the surface, the serialization design appears to be CPU inefficient and not concurrent enough. However, by adjusting the thread parameters of the NIO thread pool, multiple serialized threads can be started simultaneously to run in parallel. This partially lock-free serialized thread design is superior to the one-queue-multiple worker thread model.
Reliability, link validity detection: link idle detection mechanism, read/write idle timeout mechanism; Memory protection mechanism: Reuse ByteBuf by memory pool; Decoding protection of ByteBuf; Graceful shutdown: no new messages received, pre-processing operations before exit, resource release operations.
Netty security: Supports the following security protocols: SSL V2 and V3, TLS, ONE-WAY SSL authentication, two-way SSL authentication, and third-party CA authentication.
Efficient concurrent programming: the extensive and correct use of volatile; Extensive use of CAS and atomic classes; Use of thread-safe containers; Improve concurrency performance through read/write locks. IO communication performance three principles: Transport (AIO), protocol (Http), thread (master/slave multithreading)
Function of flow integer type (transformer) : to prevent downstream NETWORK elements from being crushed and service flow interruption due to uneven performance of upstream and downstream network elements; This prevents the communication module from receiving messages too quickly and the back-end service threads from processing messages in a timely manner.
TCP parameter Settings: SO_RCVBUF and SO_SNDBUF: 128K or 256K is recommended. SO_TCPNODELAY: The NAGLE algorithm automatically connects small packets in the buffer to form larger packets, preventing the sending of a large number of small packets from blocking the network, thus improving the network application efficiency. However, this optimization algorithm needs to be disabled for time-delay sensitive application scenarios.

10. NIOEventLoopGroup source?

NioEventLoopGroup (actually MultithreadEventExecutorGroup) internal maintenance a type to EventExecutor children [], the default size is the number of processor cores * 2, thus form a thread pool, NioEventLoopGroup overloads the newChild method when initializing EventExecutor, so the actual type of the children element is NioEventLoop.
Thread startup called SingleThreadEventExecutor constructor, perform NioEventLoop run method of a class, first call hasTasks () method to judge whether the current taskQueue elements. If there are elements in the taskQueue, the selectNow() method is executed, and eventually selector. SelectNow () is executed, which returns immediately. If the taskQueue has no elements, perform the select(oldWakenUp) method
Select (oldWakenUp) is used to record the number of times the selectCnt method has been executed and to identify whether the selectNow() method has been executed. Select (timeoutMillis) is repeatedly executed, and the variable selectCnt becomes larger and larger. When the selectCnt reaches the threshold (512 by default), the rebuildSelector method is executed to rebuild the selector. Fixed the CPU usage bug of 100%.
The rebuildSelector method starts by creating a new selector using the openSelector method. And then cancel the selectionKey of the old selector. Finally, the old selector’s channel is re-registered with the new selector. After rebuild, you need to re-execute the selectNow method to check whether the selectionKey is ready.
The processSelectedKeys method (which handles the I/O task) is then called when selectedKeys! = null, call processSelectedKeysOptimized method, Iterating selectedKeys gets the selectkey of the ready IO event stored in the array selectedKeys, then calls processSelectedKey for each event to handle it, ProcessSelectedKey handles OP_READ; OP_WRITE; OP_CONNECT events.
The last call runAllTasks method (IO), the method will first call fetchFromScheduledTaskQueue method, has more than delay the time in scheduledTaskQueue task is waiting to be performed in moved to taskQueue, If the execution time of 64 tasks exceeds the preset execution time, stop the execution of non-I/O tasks to prevent I/O task execution from being affected by too many non-I/O tasks.
Each NioEventLoop corresponds to a thread and a Selector. The NioServerSocketChannel will actively register with a Selector of a NioEventLoop, which is responsible for event polling.
Outbound events are request events. The initiator is Channel, and the handler is unsafe. Outbound events are notified through an unsafe event, and the propagation direction is from tail to head. The initiator of an Inbound event is an unsafe event. The handler of an event is a Channel, which is a notification event. The propagation direction is from beginning to end.
For the memory management mechanism, a large Chunk of memory Arena will be pre-applied for. Arena consists of many chunks, and each Chunk consists of 2048 pages by default. Chunk organizes pages in an AVL tree, with each leaf node representing a Page and the middle node representing an area of memory, and the node itself records its offset address across the Arena. When the region is allocated, the marker bits on the intermediate node are marked, indicating that all nodes below the intermediate node have been allocated. The poolChunkList allocates memory larger than 8K, and PoolSubpage allocates memory smaller than 8K, which splits a page into multiple segments for memory allocation.
ByteBuf supports automatic capacity expansion (4M), ensures that the PUT method will not throw exceptions, and realizes zero-copy through the built-in compound buffer type. There is no need to call flip() to switch the read/write mode; read and write indexes are separate; Methods chain; Based on reference counting AtomicIntegerFieldUpdater for memory recovery; PooledByteBuf uses a binary tree to implement a memory pool that centrally manages the allocation and release of memory without creating a new buffer object for each use. UnpooledHeapByteBuf creates a new buffer object each time.

I am uneducated, if there is a mistake, please point out, thank you! If you have a better suggestion, can leave a message we discuss together, common progress! Thank you for your patience in reading this blog post!

Copyright notice: This article is the blogger’s original article, if reproduced, please indicate the source, thank you! Blog.csdn.net/baiye_xing/…