Sticky packets are very common in TCP communication. Below I share a sticky packet problem encountered in real development, although its actual trigger scenario is not TCP communication, but can be applied to some similar sticky packet situation

Since I have not studied the computer network systematically, I would like to give my own understanding according to the literature I have read

1. The sender does not segment data properly
2. The data received by Tcp is stored in the buffer and read by the application program. In most cases, the data received by Tcp is stored in the buffer too fast, causing the head of a complete data group to stick to the tail of the previous complete data group. Or when reading, due to the fixed size of the buffer, the buffer is full in one read, but the data at the end of the data is not complete.
In TCP communication, the data sent and received are all arrays of char. In the case of the upper limit of buffer, sticky packets may occur

For example, the UTF8 encoding for ‘Nightmare’ is [230, 162, 166, 233, 173, 135]

utf8.decode([230, 162, 166, 233, 173, 135]);
Copy the code

To get the word ‘nightmare’, let’s delete the last number

utf8.decode([230, 162, 166, 233, 173]);
Copy the code

Then the conversion must report an error

Unfinished UTF-8 octet sequence (at offset 5)
Copy the code

Therefore, if this string of encoding occurs sticky packets in TCP, in the process of receiving and decoding data, if there is only a part of data in the tail, the decoding of this part will fail, and the beginning of the next data will also have irregular encoding.

My trigger scenario

Since PTMX creates a pair of virtual terminals, our reads and writes are based on THE PTM device. The console running application writes its own output into THE PTS, so we can get the application’s output from the PTM.

As you can see from your development experience, if we open up our terminal emulator and type in find /, the output of the terminal is not a simple program that can avoid reading data from the PTM and not stick to the packet, so you can say thousands of output to the terminal at a time.

Because dart is a loop that reads the program’s output from PTM, it is not nearly as fast as the output of find. As a result, one or two commands interrupt my debugging, even if UTF8 is decoding an irregular sequence. Decoding utF8 will stop throwing errors, but your output will have a large number of ‘�’ characters

So this is to solve the TCP utF8 character continuous receive decoding scenario

solution

According to the code of the protocol to carry out unpacking, packet operation. We simply strip out the irregular code in the tail and store it in the cache and splice it into the header the next time the data comes in

So the key challenge is how do you remove the tail of the irregular sequence

Viewing coding rules

Utf-8 from Baidu Baike

We don’t need to worry about the full coding rules, we just need to find out the rules

According to the figure above we get

After converting any character to utF8 sequence

The number starting with 1 in the first byte represents the number of bytes in the entire sequence, including this one byte
The binary bits corresponding to the remaining bytes always start with 10
The first range Unicode corresponds to an encoded byte that starts with 0

The check algorithm 🧐 is obtained

We choose to traverse from the end of the first data

Suppose we receive data in a list called Units

When the Unicode range is between 0000 and 007F

As you can see from the table above, the Unicode value for this range takes up one byte and is 0 first

What is the beginning of a dart decision binary

Do not use int.toradixString, because it will be omitted if the first digit of the corresponding binary is not 1, so we use the shift, bitwise and bitwise or so operations provided in the language.

So when

units.last & 128 == 0
Copy the code

Perform bitwise and operation with 128, as follows:

  1 0 0 0 0 0 0 0
& x x x x x x x x
Copy the code

As long as the first byte of a byte is not 0, the bitwise sum with 128 results in 0

In this case, the last byte of the sequence is the complete sequence and no processing is required

The last byte of data starts with 11

It’s one of my own techniques

If it is found to start with bit 11 corresponding to byte, it means that the number starting with byte 1 represents the number of bytes in the whole sequence. If the last byte has the function of representing number, it must not belong to the data this time, so it is directly removed and thrown into the cache to be spelled at the beginning of the next time.

else if (units.last & 192 == 192) {
    unitsCache = units.sublist(len - 1, len);
    units.removeRange(len - 1, len);
Copy the code

192 corresponds to 11000000

The last byte of data starts with 10

Starting with 10 leads to two things

This sequence is complete
incomplete

Haha nonsense 🤣

We just need to traverse backwards and record the number of bytes starting with 10. When we traverse to the number of bytes starting with 11, compare the number of 1 in the byte starting with 11 and see if the number of 1 in the byte starting with 11 is plus 1

For example, if the data is traversed backwards and there are five bytes starting with ten, and the sixth byte from the bottom has six ones, then the sequence at the end is complete. If it is seven, it means that it is one byte short of data.

We talked for most of the day

The complete code

import 'dart:convert';
import 'dart:ffi'; mixin CustomUtf { static List<int> unitsCache = <int>[]; Static String cStringtoString(Pointer<Uint8> STR) {if (str == null) {
      return null;
    }
    int len = 0;
    while(str.elementAt(++len).value ! = 0) {} List<int> units = List<int>(len);for (int i = 0; i < len; ++i) {
      units[i] = str.elementAt(i).value;
    }
    units = unitsCache + units;
    // print('len=====$len'); len = len + unitsCache.length; unitsCache.clear(); If the last character read is 0, the entire UTF8 character is 1 byte. There are no other bytes not readif (units.last & 128 == 0) {
      // print('===>${units.last}');
      try {
        return utf8.decode(units, allowMalformed: false);
      } catch (e) {
        print(units);
        print(e); }}else if (units.last & 192 == 192) {
      print('End number');
      unitsCache = units.sublist(len - 1, len);
      units.removeRange(len - 1, len);
    } else {
      // print('Find sequences that need to be assembled');
      // print(units.last);
      // print(units);
      // print('Unpacking');
      int number = 0;
      while (trueFinal int cur = units[len-1-number]; final int cur = units[len-1-number]; //print($cur ===>$cur);
        if (cur >> 6 == 2) {
          // print('Go through a 10'); // after the first 10 is recorded once}else if (cur.toRadixString(2).startsWith('1'* (number +2)) {// Start with number+2 bytes. UnitsCache = units. Sublist (len-number, len); unitsCache = units. Sublist (len-number, len); unitsCache = units. units.removeRange(len - number, len);break;
        } else if (cur.toRadixString(2).startsWith('1' * (number+1))) {
          {
            break;
          }
        }
        number++;
      }
    }
    try {
      return utf8.decode(units, allowMalformed: false);
    } catch (e) {
      print('===>$units');
      print(e);
    }
    returnnull; }}Copy the code

The above code is used to convert the native Pointer<Uint8> type to String and can handle sticky packets when a large number of characters are transferred.

Synced to my personal blog

Dart Handling sticky packets in Tcp (UTF8)

conclusion

Future posts on Flutter will continue to delve into the development of standard terminal emulators after the exam.

Article is mainly shared mainly thought, may have heard of my people know, I give the code are ancestral, use no problem I rarely touch (no time 🤪)
I have been busy with my projects and school affairs recently, so I always remember to learn.
Younger brother, I don’t have a solid foundation in all parts, so I just want to share my learning with the nuggets and help others with the same needs. If there are any mistakes, PLEASE kindly give me advice, instead of just thinking about my responsibility.