Scan the qr code below or search the wechat official account, cainiao Feiyafei, you can follow the wechat official account, read more Spring source code analysis and Java concurrent programming articles.
preface
In the last article, only analyzed netty how to solve the PROBLEM of TCP sticky packet, half packet through the codec, without specific analysis of how the decoder is to decode the data, today this article will specifically analyze the working principle of these decoders.
Netty provides us with several very common decoders, which can satisfy almost all of our scenarios. These decoders are listed in the following table from top to bottom according to the difficulty level.
FixedLengthFrameDecoder (based on fixed length decoder) LineBasedFrameDecoder (decoder) based on line separators DelimiterBasedFrameDecoder (decoder based on custom separator) LengthFieldBasedFrameDecoder (decoder based on the length field)Copy the code
The first three decoders are simple and easy to understand, while the last decoder is relatively complex and not easy to understand, but it can satisfy the most scenarios. Because the first three decoders are relatively simple, their source code is analyzed in one article, which is the main content of today’s article. The source code for the last decoder will be analyzed separately in a later article.
FixedLengthFrameDecoder
This is a fixed length decoder based on the class name. When the decoder is initialized, it assigns a value of type frameLength and decodes a data object each time a frameLength byte is read. For example, when the sender sends data four times, respectively A, BC, DEFG, HI, A total of 9 bytes, if we specify the decoder’s fixed length frameLength = 3, then it means that every 3 bytes decode the code, then the decoded result is: ABC, DEF, GHI.
+---+----+------+----+ +-----+-----+-----+
| A | BC | DEFG | HI | -> | ABC | DEF | GHI |
+---+----+------+----+ +-----+-----+-----+
Copy the code
Based on the fixed length of the decoder source code and annotations are as follows, relatively simple answer, do not expand the analysis, refer to the annotations in the source code.
public class FixedLengthFrameDecoder extends ByteToMessageDecoder {
// Indicates how long data is decoded at a time
private final int frameLength;
public FixedLengthFrameDecoder(int frameLength) {
checkPositive(frameLength, "frameLength");
// Specify the number of bytes decoded each time
this.frameLength = frameLength;
}
@Override
protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
/ / decoding
Object decoded = decode(ctx, in);
// If the decoded data object is not empty, it is saved to the out collection
if(decoded ! =null) { out.add(decoded); }}protected Object decode(
@SuppressWarnings("UnusedParameters") ChannelHandlerContext ctx, ByteBuf in) throws Exception {
// If the number of bytes that can be read is less than the length of each decode, null is returned
if (in.readableBytes() < frameLength) {
return null;
} else {
// Reads the specified length of byte data, and returns
returnin.readRetainedSlice(frameLength); }}}Copy the code
LineBaseFrameDecoder
LineBaseFrameDecoder is a decoder based on line separators. What does that mean? A data object is parsed every time a line separator (\n or \r\n) is read. For example:
+---+-------+----------+------+ +-----+-----+-------+
| A | B\nC | DE\r\nFG | HI\n | -> | AB | CDE | FGHI |
+---+-------+----------+------+ +-----+-----+-------+
Copy the code
From the above example, the principle of LineBaseFrameDecoder looks relatively simple, but in reality, the implementation is not as simple as it appears above. Several important member variables are defined on LineBaseFrameDecoder. As shown below.
// The maximum length of the decoding
private final int maxLength;
// Whether to throw an exception immediately when the length of the data read through the newline exceeds maxLength. True: immediate
private final boolean failFast;
// Whether to skip the newline character \r\n or \n when parsing data, true means skipped, false means not skipped
private final boolean stripDelimiter;
// If the maxLength length is exceeded, the data cannot be decoded and discarding is required. In this case, discarding the data is set to True, indicating discarding the data
private boolean discarding;
// The record has discarded a small number of bytes of data
private int discardedBytes;
// Location of last scan
private int offset;
Copy the code
When decoding data, the newline position is first found and the length from the current position of the read pointer to the newline position is calculated. If this length is greater than maxLength, then the data is invalid and cannot be decoded and needs to be discarded. For example, when we set maxLength = 4, in the example shown below, only two correct packets will be decoded: AB and CDEF, while GHIJBCA will be discarded because its length is 6, which exceeds the length specified by maxLength.
When decoding data, you can use the stripDelimiter property to control whether to retain the newlines \r\n or \n in the decoded data. True means that the newlines are skipped and the decoded data does not retain the newlines. If the length of the data read from the newline character exceeds maxLength, the data needs to be discarded. When should the data be discarded? Do you discard data immediately? Or do you wait until the next time you read the data and discard it? This can be controlled by the failFast property, where true indicates immediate discarding. Also, if discarding data is required, discarding property is set to true.
Look at the decoding process of the newline decoder with the source code below. The newline decoder inherits the abstract class decoder ByteToMessageDecoder mentioned in the previous article, overwriting the abstract method decode().
protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
/ / decoding
Object decoded = decode(ctx, in);
// If the decoded data can be decoded, the decoded result is stored in out
if(decoded ! =null) { out.add(decoded); }}Copy the code
As you can see, the core logic is in another overloaded decode() method. The source code for this overloaded method is very long, and it has been cleaned up a bit, the overall skeleton is as follows.
protected Object decode(ChannelHandlerContext ctx, ByteBuf buffer) throws Exception {
// return the subscript position of \n or \r\n
final int eol = findEndOfLine(buffer);
if(! discarding) {// Find the newline character
if (eol >= 0) {
/ / decoding...
return frame;
} else {
// ...
return null; }}else {
// The newline character is found
if (eol >= 0) {
// ...
} else {
// ...
}
return null; }}Copy the code
First, the subscript position of \n or \r\n is found, which is a relatively simple process of traversing the byte array. If a newline is not found, a value less than 0 is returned, and if a newline is found, a value greater than or equal to 0 is returned. Where, if \n is found, the index of \n is returned; If \r\n is found, the index value of \r is returned.
The rest of the logic, then, can be divided into two parts: whether you are in discard mode. Discarding = false; Discarding = false; discarding = false; discarding = false Discarding: true; discarding: Discarding The second part is the logic that executes in discard mode. For these two parts, each part can be divided into two cases: newline is found and newline is not found, so there are really four kinds of logic here. When the decoder’s decoding method is first called, discarding = false, i.e., the decoder is in non-discarding mode.
The first case: no drop mode and a newline character is found (eOL >= 0), in which case the code is executed as shown below. First, it calculates the length of the data between the read pointer and the newline character. Then it determines whether the length of the data exceeds the maxLength limit. If the length exceeds the maxLength limit, it indicates that the data is illegal and therefore needs to be discarded. How to discard it? Move the ByteBuf read pointer after the newline and call the fail() method to handle the exception. If the data does not exceed the maximum limit, then the data is legitimate and can be decoded normally. Then, when decoding, the stripDelimiter is used to determine whether to preserve the newline character, and finally, the decoded data is assigned to the frame and returned. This is the best case scenario.
// Non-discard mode and find newline characters
if (eol >= 0) {
final ByteBuf frame;
// Calculate the length of data to intercept
final int length = eol - buffer.readerIndex();
// Determine whether it is \r\n or \n, and return 2 if \r\n and 1 if \n
final int delimLength = buffer.getByte(eol) == '\r'? 2 : 1;
// If the length of the data read exceeds the maximum length, the data cannot be read.
if (length > maxLength) {
// Skip this data
buffer.readerIndex(eol + delimLength);
// Enter failure mode
fail(ctx, length);
return null;
}
// Whether to skip newlines
if (stripDelimiter) {
// Data is read without newline characters
frame = buffer.readRetainedSlice(length);
// Skip line breaks
buffer.skipBytes(delimLength);
} else {
// The data read contains a newline character
frame = buffer.readRetainedSlice(length + delimLength);
}
return frame;
}
Copy the code
In the second case, non-discard mode but no newline character is found (eOL < 0), the actual code is executed as shown below. Because the newline character is not found at this time, so it must be decoded out of the data. However, since we have a maxLength limit, we need to determine whether the current buffer has exceeded the maximum number of bytes that can be read. If it does, the data is definitely illegal, so all the data needs to be discarded. When should it be discarded? Do you discard it now or the next time you try to decode the data? Depending on the value of the failFast member variable, true means immediate discarding, false means discarding at the next decoding.
else {
// If no newline character is found, determine whether the length of readable data exceeds the maximum length. If so, discard the data
final int length = buffer.readableBytes();
if (length > maxLength) {
// Set the discard length to the readable length of this buffer
discardedBytes = length;
// Change the read pointer to skip the data
buffer.readerIndex(buffer.writerIndex());
// Set to discard mode
discarding = true;
offset = 0;
// Whether to enter the discard mode quickly
if (failFast) {
fail(ctx, "over "+ discardedBytes); }}return null;
}
Copy the code
In the third case, the pattern is discarded and a newline character is found (eOL >= 0), which corresponds to the following code. Because the subclass decoding method decode() is called in a loop in the parent class decoding, it goes into discard mode when data is discarded in the front. Although found a newline, at this time due to previous data need to be discarded, so at this point, will discard the find a newline before data (including the last cycle of data need to be discarded), finally will discard mode is set to false, because at this point has been cast off data, read the next cycle, Is the normal decoding judgment.
// Discard the pattern and find the newline character
if (eol >= 0) {
// The length of previously discarded data + the length of currently readable data
final int length = discardedBytes + eol - buffer.readerIndex();
// Get the length of the delimiter
final int delimLength = buffer.getByte(eol) == '\r' ? 2 : 1;
// Skip discarded data
buffer.readerIndex(eol + delimLength);
// Set the length of discarded data to 0
discardedBytes = 0;
// Set the non-discard mode
discarding = false;
if (!failFast) {
fail(ctx, length);
}
}
Copy the code
The fourth case, where the pattern is discarded and no newline character is found (eOL < 0), looks like this. At this time, because there is no newline, so it cannot be decoded correctly, and it is in discard mode, so this time all the data read is invalid, all need to be discarded, but in this section of code, we find that the data is not immediately discarded, why? The first half of the next read is discarded. If the first half of the next read is discarded, then the next read may find less than the length specified by maxLength, and we will decode it. In fact, the data is not available.
else {
// No newline character found
// Previously discarded data + all readable data this time
discardedBytes += buffer.readableBytes();
// Skip all readable data this time
buffer.readerIndex(buffer.writerIndex());
// We skip everything in the buffer and need to set the offset to 0 again.
offset = 0;
}
Copy the code
In either case, as long as you are in discard mode, you cannot decode properly, so null is returned, that is, no object is decoded.
As you can see from the previous analysis, when data is discarded, the fail() method is called, which has several overloaded methods, but ends up calling the following overloaded method.
private void fail(final ChannelHandlerContext ctx, String length) {
ctx.fireExceptionCaught(
new TooLongFrameException(
"frame length (" + length + ") exceeds the allowed maximum (" + maxLength + ') '));
}
Copy the code
ExceptionCaught () creates an exception, which we may often see in real applications, and then propagates the exception down through pipeline, eventually calling the Handler’s exceptionCaught() method.
In general, LineBaseFrameDecoder, a new line character based decoder, has a relatively simple implementation idea, which is to divide data according to \r\ N or \ N. If the data to be divided is greater than the specified maximum length, the data will be discarded, otherwise the decoding is successful.
DelimiterBasedFrameDecoder
DelimiterBasedFrameDecoder is a decoder based on separator, which is based on the user’s own specified separator to decode, if the user will separator is defined as a semicolon; , which means to decode the data according to the semicolon. In addition, the user can specify multiple separators at the same time, as long as when reading data, encountered any one of the separators, can be decoded once. For example, as shown in the figure below, if the delimiter is comma, exclamation mark, semicolon, and newline, the decoding result will be AB, CDEF, AGHI, and BCA.
In addition, if and only if, only two delimiters are specified, \r\n and \n, the delimiter-based decoder becomes a line-based decoder. When decoding, use the line splitter LineBaseFrameDecoder to decode directly.
In DelimiterBasedFrameDecoder defines several important attributes, this a few properties of meaning and purpose and LineBaseFrameDecoder decoder member variables defined in nearly the same. The meanings and functions of these member variables are given below.
// Array of delimiters, because you can specify a delimiter at the same time, so use array to store
private final ByteBuf[] delimiters;
// Maximum length limit
private final int maxFrameLength;
// Whether to skip the delimiter, true means skip
private final boolean stripDelimiter;
// Whether to discard immediately. True: Immediately
private final boolean failFast;
// Whether it is in discard mode
private boolean discardingTooLongFrame;
// Total number of bytes discarded
private int tooLongFrameLength;
/** Set only when decoding with "\n" and "\r\n" as the delimiter. */
// If the delimiters are \r\n and \n, the line-based decoder is used directly
private final LineBasedFrameDecoder lineBasedDecoder;
Copy the code
As you can see, unlike the LineBaseFrameDecoder, the delimiters decoder has two more member variables. The first is the delimiters property, which is an array to hold user-defined delimiters. Since users can define multiple delimiters, an array is used. The other is the lineBasedDecoder property, which represents the line-based decoder. Delimiters in the array become the line decoder if and only if \r\n and \n, The value of this property is initialized in the LineBaseFrameDecoder constructor. The source code is as follows.
public DelimiterBasedFrameDecoder(
int maxFrameLength, boolean stripDelimiter, boolean failFast, ByteBuf... delimiters) {
// omit other code...
if(isLineBased(delimiters) && ! isSubclass()) { lineBasedDecoder =new LineBasedFrameDecoder(maxFrameLength, stripDelimiter, failFast);
this.delimiters = null;
} else {
// omit other code...
lineBasedDecoder = null;
}
// omit other code...
}
Copy the code
In the constructor, isLineBased(delimiters) method is used to determine whether the delimiters are \r\n and \n, and if so, a line decoder is created and assigned to the lineBasedDecoder property; Otherwise leave the lineBasedDecoder property empty. The isLineBased() method has the following source code.
private static boolean isLineBased(final ByteBuf[] delimiters) {
// Return true if the delimiter array contains only \r\n and \n
if(delimiters.length ! =2) {
return false;
}
ByteBuf a = delimiters[0];
ByteBuf b = delimiters[1];
// make sure that a = \r\n and b= \n
if (a.capacity() < b.capacity()) {
a = delimiters[1];
b = delimiters[0];
}
return a.capacity() == 2 && b.capacity() == 1
&& a.getByte(0) = ='\r' && a.getByte(1) = ='\n'
&& b.getByte(0) = ='\n';
}
Copy the code
As you can see from the source code for the isLineBased() method, true is returned if and only if the delimiters are \r\n and \n, which turns the delimiter decoder into a line-based one.
Similarly, the separator decoder overrides the abstract method decode() in the parent class.
protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
// Call the overloaded method of decode() to decode
Object decoded = decode(ctx, in);
// If the decode succeeds, add it to out
if(decoded ! =null) { out.add(decoded); }}Copy the code
The core logic is in the overloaded decode(CTX,in) method. Similarly, the source code of this method is very long, I have simplified, in order to facilitate reading, the code has been slightly changed, roughly the skeleton is as follows.
protected Object decode(ChannelHandlerContext ctx, ByteBuf buffer) throws Exception {
// First check whether the line-based decoder is initialized. If it is initialized, it indicates that the delimiters are \r\n and \n, which are decoded directly with the line-based decoder
if(lineBasedDecoder ! =null) {
return lineBasedDecoder.decode(ctx, buffer);
}
int minFrameLength = Integer.MAX_VALUE;
ByteBuf minDelim = null;
// Iterate over all delimiters and find the smallest delimiter
for (ByteBuf delim: delimiters) {
// Find the location of the smallest separator
}
// If the separator is found
if(minDelim ! =null) {
// In discard mode
if (discardingTooLongFrame) {
return null;
}else{
returnframe; }}else {
// If no separator is found
// Determine whether the user is in non-discard mode
if(! discardingTooLongFrame) { }else{}return null; }}Copy the code
First, it will determine whether lineBasedDecoder is empty, if not empty, it means the delimiter is \r\n and \n, then directly use the line decoder decoding; Otherwise go to the logic.
Different from the previous analysis of the line decoder, the separator decoder, because the separator can specify more than one, so we first need to find in the readable data, the first occurrence of the separator is which, and in what position, how to find? Iterate over each delimiter, find their index in the readable data, and finally see which delimiter has the smallest index, which delimiter appears first.
The following logic is almost the same as the line decoder, which is divided into two cases: the delimiter is found and the delimiter is not found. And then for each of the previous cases, it’s subdivided into whether you’re in discard mode, so there are 4 cases. Unlike the line decoder, the line decoder determines whether it is in discard mode first, and then whether it has found the delimiter, in the same way. Data can be decoded only if the separator is found and the decoder is not in discard mode, otherwise null will be returned. On the inside of the specific details, will not expand the description, and the previous analysis of the line decoder is the same.
conclusion
Following the previous article, this article analyzes the three commonly used decoders mentioned in the previous article: The fixed length decoder, the line decoder, and the separator-based decoder are simple and easy to understand based on the images and examples in this article. They work in a similar way to the str.spilt() method we use in development, except that we need to determine the discard mode. The next article will examine the length field-based decoder, which is slightly more complex than the three decoders analyzed today, but is the most versatile.
recommended
- How to evolve from BIO to NIO to Netty
- Netty source Code Analysis series Reactor thread model
- Netty source code analysis series server Channel initialization
- Netty source code analysis series server Channel registration
- Netty source code analysis series server Channel port binding
- Netty Source Code Analysis series NioEventLoop creation and launch
- Netty source code analysis series NioEventLoop execution flow
- Netty source code analysis series of new connection access
- Netty source code analysis series: TCP sticky packet, half packet problem and Netty How to solve it