One, foreword

When we need to store or transmit some information, we usually need to use a data protocol to convert the information into a form that can be stored or transmitted (binary byte stream, encoded text, etc.). In particular, when the data source is an object, the process of converting the object is called serialization, whereas the process of converting encoded data to an object is called deserialization. The protocol itself is sometimes called Data Interchange Format. Various “data interchange formats” are widely used in the field of computer, and can be regarded as one of the “cornerstones” in some ways.

The most common protocols for converting to text are XML and JSON. XML protocol is good at description, used to build web documents, Android page building effect is good, its disadvantage is the efficiency of parsing is general. JSON protocol has good readability and parsing efficiency. It is readable and machine-friendly. Therefore, it is usually preferred when selecting data protocols.

In general, some well-implemented binary protocol schemes have advantages in terms of efficiency and coding volume over various implementations of XML/JSON protocols. When json protocol performance fails to meet requirements, people turn to binary data protocols. And binary data protocols, there are so many, there are so many (Protobuf, Protostuff, Thrift, MSgpack, Avro…) “, and found that the ease of use is far from JSON…

In terms of performance and ease of use, there’s a lot of room. After searching various materials and spending many hours, we finally realized a serialization scheme that is both efficient and easy to use. The current name of the solution is Packable.

This article introduces Packable in several chapters:

  • Chapter 2 and 3: Protocol design;
  • Chapter 4: A brief introduction to the implementation;
  • Chapter 5: Use method;
  • Chapter 6: Performance Testing;
  • Chapter 7: Review summary.

The design and implementation section (chapters 2, 3 and 4) will be quite obscure. If you have not previously understood the principle/implementation of protobuf and other protocols, you will have a poor reading experience. Look at the usage and performance testing sections and go back if you find them interesting; Usually like reading source code, like all kinds of source code analysis of friends, you can run the code, combined with the source code to see more reading experience.

2. Protobuf

After investigating various binary protocols, protobuf protocol is finally selected to implement the scheme. While Protobuf has a number of drawbacks, it also contains some good design tricks that are worth learning from.

2.1 configuration

For serialization protocols to support both forward and backward compatibility, the basic configurations are:

[key value key value ....]
Copy the code

C/C++ structures, Android parcels, etc., do not have keys and access values directly, but this is not version-compatible and cross-platform. The value may then be an underlying data type, or a composite object, and eventually the whole thing forms an “object tree.”

2.2 Data Layout

The JSON protocol uses specific symbols to separate key/value. When parsing, you need to find symbol pairs (quotation marks, parentheses) to determine data boundaries. With a protobuf, data boundaries are defined by type and lenght, and can be parsed with a forward-depth traversal. Also, because you don’t need separators, you don’t need to escape encoding for specific symbols, which is one of the reasons it’s more efficient than, say, XML/JSON.

The layout of the Protobuf fields is as follows:

<index> <type> [length] <data>
Copy the code
  • Index is the number declared in the.proto file;
  • Type is not the “type” of a specific language platform, but the “type” declared by Proto itself, which tells the program how to encode/decode.

The values are as follows:

For example, proto files declare fixed32 or float, encoded with type 5 (binary 101, 3 bits). The true language-level “type”, determined at compile time, can be either int or float. {“number”:100}, number is an int, long, float, or double, depending on how you read it.

  • Lenght: Specifies the length of the data. The value is a string, array, or nested object. The base type does not need length because the length of the base type is knowable.
  • Data: Value Indicates the data itself.

For example:

message Result {
    int32 count = 1;
}

message Data {
    string msg = 1;
    Result result = 2;
}
Copy the code
{
    "msg":"abc"."result": {"count":1}}Copy the code
|00001|010|00000011|'a' 'b' 'c'|00010|010|00000010|00001|000|00000100|
+-----+---+--------+-----------+-----+---+--------+-----+---+--------+
 index type length    data      index type length  index type  data
                                                  |<-------count---->|
|<------------ msg ----------->|<------------- result -------------->|
Copy the code

The maximum value of type is 5, which can be represented by 3 bits and can be encoded together with index. In the protobuf agreement, (the index | type), lenght, and when the data type = 0, were varint coding.

2.3 coding

2.3.1 varint

As the name implies, “variable integer”, with variable length encoding for integers. The 4-byte varint is represented as follows:

   0 ~ 2^07 - 1 0xxxxxxx
2^07 ~ 2^14 - 1 1xxxxxxx 0xxxxxxx
2^14 ~ 2^21 - 1 1xxxxxxx 1xxxxxxx 0xxxxxxx
2^21 ~ 2^28 - 1 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
2^28 ~ 2^35 - 1 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
Copy the code

The 8-byte varint and so on. Varint encoding usually saves space for smaller positive integers, such as integers in the range [0,127] that can be represented as one byte, but it saves no space for larger integers, and even more space for negative integers (int 5 bytes, long 10 bytes).

2.3.2 zigzag

The highest bit of a negative number is “1”, so varint encoding a negative number takes up more space. Protobuf uses zigzag coding to solve this problem. Its operation rules are as follows:

(n << 1) ^ (n >> 31) / / code
(n >>> 1) ^ -(n & 1) / / decoding
Copy the code

After ZigZag coding, the values become “positive integers”, sorted by absolute value (the original positive numbers are ranked after the original negative numbers). In this way, for some negative numbers with small absolute values, the encoding length is relatively short when zigZag encoding is carried out first and then varint encoding is carried out. But for integers with inherently large absolute values, ZigZag coding doesn’t help or even backfire. Zigzag encoding is enabled when a field in a proto file is declared as sint32 or sint64.

2.3.3 String Encoding

Protobuf uses UTF-8 encoding for strings.

2.3.4 Big end Small end

When type=1 or type=5, use fixed-length, small-endian order.

3. Packable protocol design

3.1 Basic coding rules

Packable refers to protobuf, configurations are also:

[key value key value ....]
Copy the code

But the data layout is different:

<flag> <type> <index> [length] [data]
Copy the code
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  flag  | type  |    index    |            value           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  1bit  | 3bit  |   4~12 bit  |                            |
Copy the code

For a protobuf, the packable index starts at 0, whereas for a protobuf, the index starts at 1. Instead of varint encoding index and type, always use one or two bytes. 3. Value may not exist (when type=0).

When the index ∈ [0], flag = 0, [flag | type | index] with a bytes; When the index ∈ [16255], flag = 1 [flag | type | 0000] as the first byte, index occupies the second byte. Currently, indexes greater than 255 are not supported. In fact, an object does not have many fields. If it is used later, it can be extended to the lower 4bit of the first byte. The layout is different, but the utility is similar, with one byte up to 15 and two bytes up to 15 (a Protobuf supports a wider range of indexes, but usually doesn’t use that much). Why not use varint to encode type and index? Ha ha, since all redesign, how convenient to implement how to come.

Then, the definition and function of packable type and Protobuf are different. The protobuf type also takes up 3 bits, which can represent 8 definitions, but is not used; For a protobuf, there are only varint, 32-bit, 64-bit, and lengh-delimited definitions.

The packable Type definition and functions are as follows:

Type Meaning User For
0 TYPE_0 0, empty object
1 TYPE_NUM_8 boolean, byte, short, int, long
2 TYPE_NUM_16 short, int, long
3 TYPE_NUM_32 int, long, float
4 TYPE_NUM_64 long, double
5 TYPE_VAR_8 A mutable object of length [1,255]
6 TYPE_VAR_16 A mutable object of length [256, 65535]
7 TYPE_VAR_32 A mutable object whose length is greater than 65535

1. An object sometimes has many unassigned fields. The default value is 0, empty string, etc. You can set type of these values to 0, and lenght and value fields do not need to be filled. In this case, saves 1 subsection over varint and Lengh-delimited for Protobuf, 4 bytes over 32-bit and 8 bytes over 64-bit for Protobuf.

2. The packable integer type is not encoded with varint, because the number of bytes stored in type is defined. For example, a variable of type long whose value is [1,255] is encoded with type 1 and decoded with only 1 byte read. Type ∈[1,4] is treated similarly, depending on the significant bits of the value to determine how many bytes need to be encoded. Packable integers in the [128,255] range can still be encoded with one byte, whereas varint encoding requires two bytes. And so on up. At its extreme, varint encoding means that long needs at most 10 bytes, and packable needs at worst 8 bytes. Also, reading and writing int/long is more efficient than varint.

3. When a field is a mutable object (string, array, object), the length is not encoded by varint either, because you know from type how many bytes to store “lenght”.

Packable takes full advantage of the representation space of Type, saving coding space and computation time.

3.2 Array encoding

To simplify the description, we agree

key = <flag> <type> <index>
Copy the code

3.2.1 Array of basic types

Basic types of data layout:

<key> [length] [v1 v2 ...]
Copy the code
  • Array elements are endian accordingly;
  • Since the length of the underlying data type is fixed, the number of elements can be obtained by dividing the length by the number of bytes of the underlying data type.

For example, if it is an int/float array, size = length / 4.

3.2.2 String Arrays

<key> [length] [size] [len1 v1 len2 v2 ...]
Copy the code
  • Since the length of the string is not fixed, we need to encode size. In this case, we use varint to encode size, because size is a positive integer (when the string is not empty) and is usually small, using varint to encode size can save space.
  • If the number of elements in the array is 0, type=0, and there is no need to encode the value part.
  • The encoding of a string consists of length + content, where content can be omitted (when the string is empty or null).
  • When the string is null, len=-1.
  • The length of the array is determined by the type in the key. Len of a string has no extra information about how many bytes it takes up, so len is encoded with varint. (Strings are usually not too long, especially strings in arrays, so varint saves space.)

3.2.3 Object Array

<key> [length] [size] [len1 v1 len2 v2 ...]
Copy the code

Arrays of objects have the same data layout as arrays of strings, except that len is encoded differently:

  • When the object is NULL, len=0xFFFF;
  • When len<=0x7FFF, len is encoded with two bytes;
  • Len is encoded in 4 bytes when len>0x7FFF.

Why not use varint encoding like strings? The main consideration is based on the implementation level: we do not know how many bytes the object needs to occupy before encoding the object. If we use varint encoding, we do not know how much space should be reserved for len, which is highly likely to be inaccurate. Then, when the value is written, there is a high probability that the bytes will need to be moved so that len has the right amount of space, which is inefficient. Therefore, two bytes are reserved directly to ensure that the encoding of objects with a length of less than 32767 does not need to be moved after being written into buffer to improve efficiency. If the length is larger than 32767, you need to move two bytes backwards, which is a lot of encoding time and a much lower percentage of moving bytes.

3.2.4 dictionary

The data structure that stores key-value pairs, called Dictionary in some programming languages and Map in others, is the same thing. Can be encoded as an array of key-values:

<key> [length] [size] [k1 v1 k2 v2 ...]
Copy the code

There are various types of key or value. If it is the basic data type, it is directly encoded with fixed length. If it is the variable length type, it is encoded according to the rules of the variable length array.

3.3 Compression coding

For some values with specific characteristics, you can add some coding rules to save space. It is important to note that the following methods do not necessarily “compress”, but only work if the characteristics are met.

3.3.1 zigzag

Zigzag encoding was introduced earlier, and Packable retains this option.

public PackEncoder putSInt(int index, int value) {
    return putInt(index, (value << 1) ^ (value >> 31));
}
Copy the code

It’s just adding a code before putInt. It is recommended that this method be enabled only if the value contains a negative number with a small absolute value. In general, putInt is used.

3.3.2 rainfall distribution on 10-12 type double

The binary representation of floating-point numbers, if you want to talk about can be taken out of a talk, considering the length and topic, this paper will not go into detail. Get straight to the conclusion:

  • 1. Double is 8 bytes long
  • 2. For some numbers that can be combined with less than 2^n, the following bytes are 0. N can be positive or negative. If n is negative, the decimal form is decimal. For example, 2^-1=0.5, 2^-2=0.25.
  • For integers whose absolute value is less than or equal to 2^21 (2097152), the last four bytes are 0.

Here are some examples of numerical, intuitive feelings:

A :-2.0 1 1000000-0000 0000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000-00000000 A :-1.0 1 0111111-1111 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 0.0 0. 0000000-0000 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 0.5 0. 0111111-1110 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 1.0 0. 0111111-1111 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 1.5 0. 0111111-1111 1000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 2.0 0. 1000000-0000 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 3.98 0. 1000000-0000 1111-11010111-00001010-00111101-01110000-10100011-11010111 - a: 31.0 0. 1000000-0011 1111-00000000-00000000-00000000-00000000-00000000-00000000 - a: 32.0 0. 1000000-0100 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 33.0 0. 1000000-0100 0000-10000000-00000000-00000000-00000000-00000000-00000000 - a: 1999.0 0. 1000000-1001 1111-00111100-00000000-00000000-00000000-00000000-00000000 - a: 3999.0 0. 1000000-1010 1111-00111110-00000000-00000000-00000000-00000000-00000000 - a: 2097151.0 0. 1000001-0011 1111-11111111-11111111-00000000-00000000-00000000-00000000 - a: 2097152.0 0. 1000001-0100 0000-00000000-00000000-00000000-00000000-00000000-00000000 - a: 2097153.0 0. 1000001-0100 0000-00000000-00000000-10000000-00000000-00000000-00000000Copy the code

The third conclusion is more valuable: if the field is a double, but is usually an integer (for example, the price of goods, which is dominated by integer prices), then there is room for compression. Packable provides the double compression option. When enabled, the encoding process is as follows: 1. 2. Swap the low four bytes with the high four bytes; 3. Encode according to the encoding method of long (when encoding of long type, if the four bytes in the highest position are 0, only the four bytes in the lowest position will be used). This saves 4 bytes for eligible double data.

3.3.3 bool array

For a bool array, if you encode a bool in one byte, it’s wasteful; It’s easy to imagine that one byte could encode eight bool values. Because the array size is not necessarily a multiple of 8, additional information is required to record the array size. One option is to record size after lenght like an object array, but that is not the most efficient; In fact, it can record remain=size%8, and size can be calculated by combining length and remain when decoding. When size is large, one byte is not enough; Remian is always less than 8 and can be expressed with 3 bits.

3.3.4 Enumerating arrays

When enumeration values can only take two values (such as yes/no, available/unavailable), one bit can be used to encode one value; When the enumeration value is [0,3], a value can be encoded with 2 bits. And so on… Of course, if the enumeration value is greater than 255, int encoding is fine. When enumeration values are less than or equal to 255, one or more values can be encoded in a single byte. Data layout bool array is similar to:

<key> [length] [remain] [v1 v2  ...]
Copy the code

3.3.5 int/long/double arrays

Int /long/double as a single field, since type can record several bytes of information, it can be compressed; Can the elements of an array be compressed? Each value uses an extra 2 bits to record how many bytes it takes. Two bits can represent four different situations. Here are two bits from 0 to 4, corresponding to the values taken by each type.

bits 0 1 2 3
int [0, 7] [0] [0, 31]
long [0, 7] [0] [0 tobacco]
double [48-63] [32 tobacco] [0 tobacco]

Int and long both start at the low end, because the high end is 0 when the value is small; For example, for 1, 1.5 and 2 equivalent bits [16,63] are all 0, so only 2 bytes of the high order are needed to record. If the value is 0, only bits are recorded and no value is encoded.

The compressed array data layout is as follows:

<key> [length] [size] [bits] [v1 v2  ...]
Copy the code

Size is encoded with varint; Additional bits follow size, with each value taking up 2 bits; The subsequent array then decides how many subsections to take up depending on whether it can be compressed. This strategy does not necessarily compress, but depends on the array itself. It is usually better when most elements are small. In the extreme case, all elements of the array are 0, then [v1, v2… The parts are empty, and each element takes only 2 bits.

If you need to transfer the data of a data table, you might as well assemble the data in “column” mode, so that the codec is faster; For sparse fields (0 in most cases) or fields with small values, you are advised to compress them.

Fourth, framework implementation

Limited to space, this article only Outlines the key process, more details you can see the source code for understanding.

4.1 Defining Types

Recall from the previous section that the packable type takes up three bits, and the highest bit of the byte is used to indicate whether the index should be written in the remaining four bits or the next byte.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  flag  | type  |    index    |            value           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  1bit  | 3bit  |   4~12 bit  |                            |
Copy the code

To do this, we define constants as follows:

final class TagFormat {
    private static final byte TYPE_SHIFT = 4;
    static final byte BIG_INDEX_MASK = (byte) (1 << 7);
    static final byte TYPE_MASK = 7 << TYPE_SHIFT;
    static final byte INDEX_MASK = 0xF;
    static final int LITTLE_INDEX_BOUND = 1 << TYPE_SHIFT;

    static final byte TYPE_0 = 0;
    static final byte TYPE_NUM_8 = 1 << TYPE_SHIFT;
    static final byte TYPE_NUM_16 = 2 << TYPE_SHIFT;
    static final byte TYPE_NUM_32 = 3 << TYPE_SHIFT;
    static final byte TYPE_NUM_64 = 4 << TYPE_SHIFT;
    static final byte TYPE_VAR_8 = 5 << TYPE_SHIFT;
    static final byte TYPE_VAR_16 = 6 << TYPE_SHIFT;
    static final byte TYPE_VAR_32 = 7 << TYPE_SHIFT;
}
Copy the code

4.2 Implement the Buffer class

public final class EncodeBuffer {
    byte[] hb;
    int position;

    public void writeInt(int v) {
        hb[position++] = (byte) v;
        hb[position++] = (byte) (v >> 8);
        hb[position++] = (byte) (v >> 16);
        hb[position++] = (byte) (v >> 24);
    }
    // ...
}
Copy the code

The Buffer class only needs to provide basic encoding methods, and Buffer expansion is implemented by the caller. Because sometimes multiple values need to be written consecutively, it is more cost-effective for the calling site to make a unified judgment for expansion than for each call to the Buffer interface.

4.3 Coding

public final class PackEncoder {
    private final EncodeBuffer buffer;

    final void putIndex(int index) {
        if (index >= TagFormat.LITTLE_INDEX_BOUND) {
            buffer.writeByte(TagFormat.BIG_INDEX_MASK);
        }
        buffer.writeByte((byte) (index));
    }

    public PackEncoder putInt(int index, int value) {
        checkCapacity(6); // Check the buffer capacity
        if (value == 0) {
            putIndex(index);
        } else {
            int pos = buffer.position;
            putIndex(index);
            if ((value >> 8) = =0) {
                buffer.hb[pos] |= TagFormat.TYPE_NUM_8;
                buffer.writeByte((byte) value);
            } else if ((value >> 16) = =0) {
                buffer.hb[pos] |= TagFormat.TYPE_NUM_16;
                buffer.writeShort((short) value);
            } else{ buffer.hb[pos] |= TagFormat.TYPE_NUM_32; buffer.writeInt(value); }}return this; }}Copy the code

Implementation steps of coding method:

  • 1. Check the buffer capacity. If the capacity is insufficient, expand the capacity
  • 2. Write index
  • 3, write the type with the index and type in different bits, so use “|” additional operation; When value is 0, type=0, so no special writing is required.
  • In the example above, write a value to an int. Write the corresponding byte according to the size of the value. For example, if value < 256, only one byte is written. Encoding other basic types is similar in general steps.

Encoding objects is a little more complicated. Need to serialize the object to implement Packable encode method, with PackEncoder write object fields. If an object has an object in its field, that object also implements Packable (called recursively during coding).

public interface Packable {
    void encode(PackEncoder encoder);
}
Copy the code

The specific encoding object process is as follows:

    public PackEncoder putPackable(int index, Packable value) {
        if (value == null) {
            return this;
        }
        checkCapacity(6);
        int pTag = buffer.position;
        putIndex(index);
        // Set aside 4 bytes to store length
        buffer.position += 4;
        int pValue = buffer.position;
        value.encode(this);
        if (pValue == buffer.position) {
            buffer.position -= 4; // If value is empty, the reserved space is reclaimed
        } else {
            putLen(pTag, pValue);
        }
        return this;
    }

    private void putLen(int pTag, int pValue) {
        int len = buffer.position - pValue;
        if (len <= 127) {
            buffer.hb[pTag] |= TagFormat.TYPE_VAR_8;
            buffer.hb[pValue - 4] = (byte) len;
            System.arraycopy(buffer.hb, pValue, buffer.hb, pValue - 3, len);
            buffer.position -= 3;
        } else {
            buffer.hb[pTag] |= TagFormat.TYPE_VAR_32;
            buffer.writeInt(pValue - 4, len); }}Copy the code

This is similar to the steps used to encode the underlying type, except that type is written after, because the write strategy is to encode value first, and then write the length of value and type. To avoid excessive byte movement, perform compact (move bytes, compress space) only when value length is less than 127. TYPE_VAR_16 is not available. This is useful when encoding arrays or strings, because you know how many bytes you need to use before writing a buffer. You don’t need to reserve the length of a buffer as you would when writing an object.

Most frameworks need to first fill the container with values when implementing encoding, and then iterate over the container during encoding to encode each node into buffer. Java implementations like Protobuf write an object by iterating over each field, calculating how much space it takes up, then writing length, then value. Thus, each field of the object is accessed twice. Packable’s write strategy is to write immediately when the PUT method is called, so that each field is accessed only once. Although compact operations are required to encode some small objects, the overall efficiency is ok due to the small number of bytes that need to be moved and the spatial locality. Most importantly, this policy is easy to implement! Calculating the space taken up by each field requires a lot more code and is less efficient.

4.4 Decoding

public interface PackCreator<T> {
    T decode(PackDecoder decoder);
}

public final class PackDecoder {
    static final long NULL_FLAG = ~0;
    static final long INT_MASK = 0xffffffffL;

    private DecodeBuffer buffer;
    private long[] infoArray;
    private int maxIndex = -1;

    private void parseBuffer(a) {
        / /... Initialization code...
        while (buffer.hasRemaining()) {
            byte tag = buffer.readByte();
            int index = (tag & TagFormat.BIG_INDEX_MASK) == 0 ? tag & TagFormat.INDEX_MASK : buffer.readByte() & 0xff;
            if (index > maxIndex)  maxIndex = index;
            byte type = (byte) (tag & TagFormat.TYPE_MASK);
            if (type <= TagFormat.TYPE_NUM_64) {
                if (type == TagFormat.TYPE_0) {
                    infoArray[index] = 0L;
                } else if (type == TagFormat.TYPE_NUM_8) {
                    infoArray[index] = ((long) buffer.readByte()) & 0xffL;
                } else if (type == TagFormat.TYPE_NUM_16) {
                    infoArray[index] = ((long) buffer.readShort()) & 0xffffL;
                } else if (type == TagFormat.TYPE_NUM_32) {
                    infoArray[index] = ((long) buffer.readInt()) & 0xffffffffL;
                } else {
                    // the processing of TYPE_NUM_64 is more complicated.}}else {
                int size;
                if (type == TagFormat.TYPE_VAR_8) {
                    size = buffer.readByte() & 0xff;
                } else if (type == TagFormat.TYPE_VAR_16) {
                    size = buffer.readShort() & 0xffff;
                } else {
                    size = buffer.readInt();
                }
                infoArray[index] = ((long) buffer.position << 32) | (long) size; buffer.position += size; }}// At the end of the function, infoArray records the corresponding value, location, and length of each index
        InfoArray [I] = NULL_FLAG if no value is assigned and the subscript is less than maxIndex
    }

    long getInfo(int index) {
        if (maxIndex < 0) {
            parseBuffer();
        }
        if (index > maxIndex) {
            return NULL_FLAG;
        }
        return infoArray[index];
    }

    public int getInt(int index, int defValue) {
        long info = getInfo(index);
        return info == NULL_FLAG ? defValue : (int) info;
    }

    public <T> T getPackable(int index, PackCreator<T> creator, T defValue) {
        long info = getInfo(index);
        if (info == NULL_FLAG) {
            return defValue;
        }
        int offset = (int) (info >>> 32);
        int len = (int) (info & INT_MASK);
        PackDecoder decoder = pool.getDecoder(offset, len);
        T object = creator.decode(decoder);
        decoder.recycle();
        returnobject; }}Copy the code

Decoding is the reverse operation of coding. Basic operations include:

  • 1, read (type | indxe)
  • 2. Decompose type and index
  • 3. Read the corresponding value based on type. The value will be cached in infoArray[index]. If the type is variable length, piece together the offset length into a long and fill it in the infoArray.
  • 4. InfoArray [index] is directly read when the basic type is read. When reading a variable length type, disassemble offset and len, position them to the corresponding position, and read the value of the specified length.

When getPackable is called, the decode method is recursively called if the Packable object has a nested type, just as it is recursively called when coding.

Five, usage,

5.1 Common Usage

When serializing/deserializing an object, implement the above interface and then call the encoding/decoding method. For example:

static class Data implements Packable {
    String msg;
    Item[] items;

    @Override
    public void encode(PackEncoder encoder) {
        encoder.putString(0, msg)
                .putPackableArray(1, items);
    }

    public static final PackCreator<Data> CREATOR = decoder -> {
        Data data = new Data();
        data.msg = decoder.getString(0);
        data.items = decoder.getPackableArray(1, Item.CREATOR);
        return data;
    };
}

static class Item implements Packable {
    int a;
    long b;

    Item(int a, long b) {
        this.a = a;
        this.b = b;
    }

    @Override
    public void encode(PackEncoder encoder) {
        encoder.putInt(0, a);
        encoder.putLong(1, b);
    }

    static final PackArrayCreator<Item> CREATOR = new PackArrayCreator<Item>() {
        @Override
        public Item[] newArray(int size) {
            return new Item[size];
        }

        @Override
        public Item decode(PackDecoder decoder) {
            return new Item(
                    decoder.getInt(0),
                    decoder.getLong(1)); }}; }static void test(a) {
    Data data = new Data();
    / / the serialization
    byte[] bytes = PackEncoder.marshal(data);
    // deserialize
    Data data_2 = PackDecoder.unmarshal(bytes, Data.CREATOR);
}
Copy the code
  • serialization

Implements Packable; 2. Implement encode() method, encode each field (PackEncoder provides various types of API); 3. Call the PackEncoder. Marshal () method, pass in the object, and get the byte array.

  • deserialization

1. Create a static object that is an instance of PackCreator. 2. Decode () method, decode each field, assign value to the object; 3. Call PackDecoder.unmarshal() and pass in the byte array and PackCreator instance to get the object.

If you want to deserialize an array of objects, you need to create an instance of PackArrayCreator (this is true in the Java version, but not in other versions). PackArrayCreator inherits from PackCreator, with a new method called newArray. Simply create an array of objects of the corresponding type and return it.

5.2 Direct Coding

The above examples are just one of the examples, which can be flexibly used in the process of specific use. 1. PackCreator does not have to be created in the class that you want to deserialize. It can be created elsewhere and can be named as you like. 2. If only serialization (sender) is required, only Packable can be implemented, and PackCreator is not required, and vice versa. 3, if there is no class definition, or it is not convenient to rewrite the class, you can also directly encode/decode.

static void test2(a) {
    String msg = "message";
    int a = 100;
    int b = 200;

    PackEncoder encoder = new PackEncoder();
    encoder.putString(0, msg)
                .putInt(1, a)
                .putInt(2, b);
    byte[] bytes = encoder.getBytes();

    PackDecoder decoder = PackDecoder.newInstance(bytes);
    String dMsg = decoder.getString(0);
    int dA = decoder.getInt(1);
    int dB = decoder.getInt(2);
    decoder.recycle();
}
Copy the code

5.3 Custom Coding

Consider the following class:

class Info  {
    public long id;
    public String name;
    public Rectangle rect;
}
Copy the code

Rectangle is a JDK class), with four fields:

class Rectangle {
  int x, y, width, height;
}
Copy the code

Of course, there are many ways to do this (having Rectangle implement Packable isn’t one of them, since you can’t change the JDK). One efficient (execution efficiency) approach provided by Packable:

public static class Info implements Packable {
    public long id;
    public String name;
    public Rectangle rect;

    @Override
    public void encode(PackEncoder encoder) {
        encoder.putLong(0, id)
                .putString(1, name);
        // Return PackEncoder's buffer
        EncodeBuffer buf = encoder.putCustom(2.16);     // 4 ints, 16 bytes
        buf.writeInt(rect.x);
        buf.writeInt(rect.y);
        buf.writeInt(rect.width);
        buf.writeInt(rect.height);
    }

    public static final PackCreator<Info> CREATOR = decoder -> {
        Info info = new Info();
        info.id = decoder.getLong(0);
        info.name = decoder.getString(1);
        DecodeBuffer buf = decoder.getCustom(2);
        if(buf ! =null) {
            info.rect = new Rectangle(
                    buf.readInt(),
                    buf.readInt(),
                    buf.readInt(),
                    buf.readInt());
        }
        return info;
    };
}
Copy the code

In general, it is not uncommon for large objects to nest small objects with fixed fields. Using this method, you can reduce the level of recursion and reduce the parsing of index, which can improve a lot of efficiency.

5.4 Type Support

This is the overall use of packable serialization/deserialization. Specific to PackEncoder/PackDecoder, what interfaces are provided (what types are supported). Taking PackEncoder as an example, some interfaces are as follows:

  • PutSInt, putSLong, and putCDouble of the base types are compressed encodings (see Section 3.3).
  • The Map contains too many key-value types. Therefore, only some common types are implemented, and a putMap interface is reserved for custom implementation.

Six, performance test

In addition to Protobuf, Gson (Java platform, one of the serialization frameworks for JSON protocol) was chosen for comparison.

Spatially, serialized data size:

Data size (byte)
packable 2537191 (57%)
protobuf 2614001 (59%)
gson 4407901 (100%)

Packable and Protobuf are similar in size (packable is slightly smaller), about 57% of Gson size.

In terms of time consuming, two sets of data were tested on PC and mobile phone respectively:

  1. Macbook Pro
Serialization Time (ms) Deserialization Time (ms)
packable 9 8
protobuf 19 11
gson 67 46
  1. The glory of 20 s
Serialization Time (ms) Deserialization Time (ms)
packable 32 21
protobuf 81 38
gson 190 128

It should be noted that data characteristics, test platform and other factors will affect the results. The above test results are for reference only. You can compare them with your own business data.

Seven,

Packable and Protobuf generally perform better than JSON, but are hard to read. One solution to improve readability is to deserialize binary content into Java objects, which are then converted into JSON using frameworks such as Gson.

Overall, Packable has the following advantages:

  • 1, excellent performance, fast encoding and decoding speed; The encoded message commits small.
  • 2, code lightweight on the one hand is the package size, for example, Java, protobuf JAR package is close to 2M, and packable JAR package is only 37K; Another aspect is the amount of code required to add new message types, such as the datatype defined in the previous section. A Protobuf compiled Java file has more than 5,000 lines, while a Packable class file has only a few hundred lines.
  • 3, easy to use The process of using protobuf is relatively tedious, need to write. Proto file, compile into the corresponding language platform code, copy into the project, project integration SDK…… If you need to add new fields, you need to modify the. Proto file, edit it again, and copy it into the project again. In contrast, packable can be modified from existing objects. For defined classes, the relevant interface can be implemented, and the relevant implementation and invocation need not be changed. If you need to add and delete fields, you can simply add and delete fields in the code.
  • 4, flexible method can realize serialization interface (or deserialization interface); In addition to object serialization/deserialization, it also supports direct encoding, custom encoding, etc.
  • 5, support various types, mutable object support null type (protobuf not supported).
  • 6. Support a variety of compression strategies

In terms of language support, Packable currently implements Java, C++, C#, Objective-C, Go and other versions. The protocols are consistent and can be transmitted between different language platforms.

Project address: github.com/BillyWei001…