04. Apache Thrift transport protocol

As we all know, data is transmitted over the network in binary mode.

When a Java object is transferred from a client to a server over the network, the client needs to convert it to binary and then write network IO. When receiving data from network IO, the server also needs to convert binary data into objects before performing operations.

These two conversion processes are collectively called codec operations.

In Thrift, the codec operation, also known as the transport protocol, is defined by the abstract TProtocol class. Its methods fall into two main categories:

TProtocolWrite method:

TProtocolRead method:

The write method specifies how to convert data to binary, and the read method specifies how to convert binary to data.

The commonly used transport protocols in thrift are as follows:

TBinaryProtocol: Data is transmitted in binary encoding format
TCompactProtocol: Efficient, dense binary encoding format for data transmission
TJSONProtocol: Uses the JSON text data encoding protocol for data transmission
TSimpleJSONProtocol: Provides only JSON write-only protocol, suitable for parsing through scripting languages
TMultiplexedProtocol: composite protocol that processes multiple services simultaneously

These transport protocols are subclasses of TProtocol, so let’s examine the operation of these methods.

TBinaryProtocol

TBinaryProtocol is the most basic implementation, and the resulting binary data is raw (as opposed to compressed data). Here we pick a few methods to understand the conversion process.

Read and write`I32`Type data

I32 data is a 32-bit integer, also known as a Java int, which can be written as follows:

public void writeI32(int i32) throws TException {
  // Convert data of type I32 to byte arrays
  inoutTemp[0] = (byte) (0xff & (i32 >> 24));
  inoutTemp[1] = (byte) (0xff & (i32 >> 16));
  inoutTemp[2] = (byte) (0xff & (i32 >> 8));
  inoutTemp[3] = (byte) (0xff & (i32));
  trans_.write(inoutTemp, 0.4);
}
Copy the code

In this code, the int type is converted to a byte array of length 4 by a bit operation.

Int reads like this:

  @Override
  public int readI32(a) throws TException {
    byte[] buf = inoutTemp;
    int off = 0;
    // Read 4 bytes
    if (trans_.getBytesRemainingInBuffer() >= 4) {
      buf = trans_.getBuffer();
      off = trans_.getBufferPosition();
      trans_.consumeBuffer(4);
    } else {
      readAll(inoutTemp, 0.4);
    }
    // To int by bit operation
    return
      ((buf[off] & 0xff) < <24) |
      ((buf[off+1] & 0xff) < <16) |
      ((buf[off+2] & 0xff) < <8) |
      ((buf[off+3] & 0xff));
  }
Copy the code

The reading operation is relatively simple: four bytes are read on the data stream, and then the four bytes are converted to int data through bitwise operations.

Read and write`I64`data

Basic data types operate like this, such as long:

  @Override
  public void writeI64(long i64) throws TException {
    // Convert i64 data to byte arrays
    inoutTemp[0] = (byte) (0xff & (i64 >> 56));
    inoutTemp[1] = (byte) (0xff & (i64 >> 48));
    inoutTemp[2] = (byte) (0xff & (i64 >> 40));
    inoutTemp[3] = (byte) (0xff & (i64 >> 32));
    inoutTemp[4] = (byte) (0xff & (i64 >> 24));
    inoutTemp[5] = (byte) (0xff & (i64 >> 16));
    inoutTemp[6] = (byte) (0xff & (i64 >> 8));
    inoutTemp[7] = (byte) (0xff & (i64));
    trans_.write(inoutTemp, 0.8);
  }
Copy the code

The long type is 8 bytes, so the converted byte array is 8 bytes long.

Write as follows:

  @Override
  public long readI64(a) throws TException {
    byte[] buf = inoutTemp;
    int off = 0;
    // Read 8 bytes
    if (trans_.getBytesRemainingInBuffer() >= 8) {
      buf = trans_.getBuffer();
      off = trans_.getBufferPosition();
      trans_.consumeBuffer(8);
    } else {
      readAll(inoutTemp, 0.8);
    }
    // Convert the byte array to log
    return
      ((long)(buf[off]   & 0xff) < <56) |
      ((long)(buf[off+1] & 0xff) < <48) |
      ((long)(buf[off+2] & 0xff) < <40) |
      ((long)(buf[off+3] & 0xff) < <32) |
      ((long)(buf[off+4] & 0xff) < <24) |
      ((long)(buf[off+5] & 0xff) < <16) |
      ((long)(buf[off+6] & 0xff) < <8) |
      ((long)(buf[off+7] & 0xff));
  }
Copy the code

The operation of reading long data is very similar to that of an int, in that eight bytes are read on the data stream and then converted into long data by a bit operation.

Shaping data such as byte, char, and short operates in a similar way: fixed-length bytes are read on the data stream, and then bit-operations are used to convert those bytes into data of the specified type, but I won’t go into that here.

Read and write`boolean`Type data

Thrift treats Boolean data as byte:

  public void writeBool(boolean b) throws TException {
    writeByte(b ? (byte)1 : (byte)0);
  }
Copy the code

When writing Boolean data, handle it as byte, with 0 as false and 1 as true.

Read operations are also handled as byte operations, which I won’t go into here.

Read and write`String`Type data

For example, int has 4 bytes, long has 8 bytes, and byte has 1 byte. How to write and read variable length data such as String?

String data can be written as follows:

public void writeString(String str) throws TException {
    byte[] dat = str.getBytes(StandardCharsets.UTF_8);
    // Write length data
    writeI32(dat.length);
    trans_.write(dat, 0, dat.length);
}
Copy the code

The write operation consists of two parts:

Write the length of the
Write binary data

Unlike the data type, the length of the data type String is not fixed. Therefore, the length of the data must be specified before the data type String is read. In this way, the number of bytes to be read is known.

  @Override
  public String readString(a) throws TException {
    // Read the length of the String first
    int size = readI32();

    // Read fixed-length byte, then convert to String
    if (trans_.getBytesRemainingInBuffer() >= size) {
      String s = new String(trans_.getBuffer(), trans_.getBufferPosition(),
          size, StandardCharsets.UTF_8);
      trans_.consumeBuffer(size);
      return s;
    }
    return readStringBody(size);
  }
Copy the code

Read and write complex object data

Now that we’ve looked at writing and reading basic types, how do we read and write complex objects?

It should be noted that complex objects are also made up of basic objects, such as QryResult:

public class QryResult implements.{

  public int code;

  publicString msg; . }Copy the code

Its writing method for QryResult. QryResultStandardScheme# write:

public void write(org.apache.thrift.protocol.TProtocol oprot, QryResult struct) 
    throws org.apache.thrift.TException {
  struct.validate();
  // Write the start identifier of the object
  oprot.writeStructBegin(STRUCT_DESC);
  // Write the attribute start identifier
  oprot.writeFieldBegin(CODE_FIELD_DESC);
  oprot.writeI32(struct.code);
  // Write the end start flag
  oprot.writeFieldEnd();
  if(struct.msg ! =null) {
    // Write the attribute start identifier
    oprot.writeFieldBegin(MSG_FIELD_DESC);
    oprot.writeString(struct.msg);
    // Write the end start flag
    oprot.writeFieldEnd();
  }
  oprot.writeFieldStop();
  // Write the end identifier of the object
  oprot.writeStructEnd();
}
Copy the code

From a write operation,

Before and after an object is written, an identifier is written to indicate where the object begins and ends
Before and after an attribute is written, an identifier is written to indicate the start and end position of the attribute. There may be multiple attributes
Attributes are primitive types, such asbyte,int,StringEtc.

CODE_FIELD_DESC and MSG_FIELD_DESC are written as follows:

  // code
  private static final TField CODE_FIELD_DESC = new TField("code", I32, (short)1);
  // msg
  private static final TField MSG_FIELD_DESC = new TField("msg", TType.STRING, (short)2);
Copy the code

Take a look at QryResult object read operations, enter QryResult. QryResultStandardScheme# read method:

public void read(org.apache.thrift.protocol.TProtocol iprot, QryResult struct) 
    throws org.apache.thrift.TException {
  org.apache.thrift.protocol.TField schemeField;
  // Read the start flag of the object
  iprot.readStructBegin();
  while (true)
  {
    // Read the attribute start tag, which contains the attribute type
    schemeField = iprot.readFieldBegin();
    if (schemeField.type == org.apache.thrift.protocol.TType.STOP) { 
      break;
    }
    switch (schemeField.id) {
      case 1: // CODE
        // Determine the type
        if (schemeField.type == org.apache.thrift.protocol.TType.I32) {
          // Read code as i32 (i.e. Int)
          struct.code = iprot.readI32();
          struct.setCodeIsSet(true);
        } else { 
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
        }
        break;
      case 2: // MSG
        // Determine the type
        if (schemeField.type == org.apache.thrift.protocol.TType.STRING) {
          // read MSG as String
          struct.msg = iprot.readString();
          struct.setMsgIsSet(true);
        } else { 
          org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
        }
        break;
      default:
        org.apache.thrift.protocol.TProtocolUtil.skip(iprot, schemeField.type);
    }
    // Read the attribute end tag
    iprot.readFieldEnd();
  }
  // Read the end of the object
  iprot.readStructEnd();

  struct.validate();
}
Copy the code

Read operations are as follows:

Read object start flag
Loop read properties:
1. Read the property start tag
2. To read the attribute value, the identification contains the attribute value type. When reading the attribute value, you only need to call the corresponding method to read it
3. Read property end tag
Read the end of the object flag

So that’s how complex objects are read.

TCompactProtocol

Next let’s look at the TCompactProtocol, which is commented as follows:

TCompactProtocol2 is the Java implementation of the compact protocol specified in THRIFT-110. The fundamental approach to reducing the overhead of structures is a) use variable-length integers all over the place and b) make use of unused bits wherever possible. Your savings will obviously vary based on the specific makeup of your structs, but in general, the more fields, nested structures, short strings and collections, and low-value i32 and i64 fields you have, the more benefit you’ll see.

TCompactProtocol2 is a Java implementation of the compact protocol specified in THRIFT-110. The basic way to reduce structural overhead is a) to use variable-length integers throughout the location, and b) to use unused bits whenever possible. Your savings will obviously vary depending on the exact composition of the structure, but in general, the more fields, nested structures, short strings and collections, and low-value I32 and I64 fields you have, the more benefits you’ll see.

From the comments, it is used to save data space and uses a compact protocol, so let’s see how it does this.

Read and write`i32` Type data

Let’s enter the writeI32 method:

public void writeI32(int i32) throws TException {
  writeVarint32(intToZigZag(i32));
}

/** * convert */
private int intToZigZag(int n) {
  return (n << 1) ^ (n >> 31);
}

/** * write variable-length operation */
private void writeVarint32(int n) throws TException {
  int idx = 0;
  while (true) {
    if ((n & ~0x7F) = =0) {
      temp[idx++] = (byte)n;
      // writeByteDirect((byte)n);
      break;
      // return;
    } else {
      temp[idx++] = (byte)((n & 0x7F) | 0x80);
      // writeByteDirect((byte)((n & 0x7F) | 0x80));
      n >>>= 7;
    }
  }
  trans_.write(temp, 0, idx);
}
Copy the code

From the code, the write method has two operations:

useintToZigZagwillintType data is converted toZigZag
usewriteVarint32Will get theZigZagwrite

About the ZigZag is what, I find an article to introduce the Internet: small and smart digital compression algorithm: ZigZag ((blog.csdn.net/zgwangbo/ar…

Take the int type as an example. Its core idea is as follows: For the integer 1 of int type (the binary value is 00000000_00000000_00000000_00000000_00000001), if the first three bytes are transmitted as int, the first three bytes are 0, and only the fourth byte is valid. In this case, the transfer of the first three bytes is wasteful. In this case of small integers, ZigZag’s approach is to reduce the number of bytes transferred by only transmitting what is valid, so that in the end you only need to transmit 00000001, which saves 3 bytes.

According to ZigZag’s thinking, ZigZag only works with integer data types greater than 1 byte, such as char, short, int, long, and so on.

Read and write`double`Type data

ZigZag can handle integer data types larger than 1 byte. Can ZigZag handle double data types?

Let’s look at operations of type double:

public void writeDouble(double dub) throws TException {
  fixedLongToBytes(Double.doubleToLongBits(dub), temp, 0);
  trans_.write(temp, 0.8);
}
Copy the code

The code still writes 8 bytes, so you can’t compress a double.

Read and write`String`type

Consider the String type:

public void writeString(String str) throws TException {
  byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
  // The length can still be used
  writeVarint32(bytes.length);
  trans_.write(bytes, 0, bytes.length);
}
Copy the code

In the writeString method, two pieces of data are written:

StringLength of data converted to binary: int, can be used only for lengthZigZagThe compression
StringSpecific data: Is not an integer and cannot be usedZigZagThe compression

As you can see, since the length is int, you can use the ZigZag algorithm only for the length, and the data part is powerless.

From the above analysis, ZigZag is only the compression of plastic data types (char, short, int, long), in addition to other types of data can not do anything.

TJSONProtocol and`TSimpleJSONProtocol`

Json serialization protocol (THRFIT) Json serialization protocol (THRFIT)

The TJSONProtocol is commented as follows:

JSON protocol implementation for thrift. This is a full-featured protocol supporting write and read. Please see the C++ class header for a detailed description of the protocol’s wire format.

JSON protocol implementation of thrift. This is a fully functional protocol that supports reading and writing. See the C++ class header file for a detailed explanation of the wired format of the protocol.

Note that this is a fully functional protocol that supports reading and writing, as opposed to TSimpleJSONProtocol, which is annotated as follows:

JSON protocol implementation for thrift. This protocol is write-only and produces a simple output format suitable for Parsing by scripting languages. It should not be confused with the full-featured TjsonProtocol. JSON protocol. The protocol is write-only and produces a simple output format suitable for parsing through scripting languages. This should not be confused with the full-featured TJSONProtocol.

Thrift provides two JSON serialization protocols:

TJSONProtocolSupport:A full-function protocol for reading and writing
TSimpleJSONProtocol:A written agreement

In Thrift, the transmission is mostly binary protocol, while json protocol is rarely used. In this paper, we will only analyze the read and write operations of JSON.

TMultiplexedProtocol

TMultiplexedProtocol is not a transport protocol but a wrapper for the protocol that allows Thrift clients to communicate with the Thrift server during function calls by adding the service name before it.

If the client uses TMultiplexedProtocol as the transport protocol, the server needs to use TMultiplexedProcessor to process requests from the multiplexing client.

`TMultiplexedProtocol`The sample

TMultiplexedProtocol is used to implement a single socket transport to invoke two services:

Client:

TTransport transport = new TSocket("localhost", SERVER_PORT);
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);

// Specify serviceName and the service to process
// helloService
TMultiplexedProtocol helloService = new TMultiplexedProtocol(
        protocol, "helloService");
HelloService.Client client = new HelloService.Client(helloService);
System.out.println(client.hello("thrift world"));

// queryService
TMultiplexedProtocol helloProtocol = new TMultiplexedProtocol(
        protocol, "queryService");
QueryService.Client queryClient = new QueryService.Client(helloProtocol);
System.out.println(queryClient.query(1));
Copy the code

Server:

// Build the processor, specifying serivceNam and the corresponding processor
TMultiplexedProcessor processor = new TMultiplexedProcessor();
processor.registerProcessor("helloService".new HelloService.Processor<>(new HelloServiceImpl()));
processor.registerProcessor("queryService".new QueryService.Processor<>(new QueryServiceImpl()));
// Generate a TServer instance
TServer server = new TSimpleServer(
  new TServer.Args(new TServerSocket(port)).processor(processor));
System.out.println("Starting the simple server...");
server.serve();
Copy the code

`TMultiplexedProtocol`Implementation principle of

How does TMultiplexedProtocol do this? Yes is at serviceName!

Multiplex TMultiplexedProtocol constructor

public TMultiplexedProtocol(TProtocol protocol, String serviceName) {
    super(protocol);
    SERVICE_NAME = serviceName;
}
Copy the code

In the TMultiplexedProtocol constructor, the serviceName (serviceName) needs to be specified.

Let’s look at the data write operation:

@Override
public void writeMessageBegin(TMessage tMessage) throws TException {
  if (tMessage.type == TMessageType.CALL || tMessage.type == TMessageType.ONEWAY) {
      super.writeMessageBegin(new TMessage(
              // To write data, specify the service name
              SERVICE_NAME + SEPARATOR + tMessage.name,
              tMessage.type,
              tMessage.seqid
      ));
  } else {
      super.writeMessageBegin(tMessage); }}Copy the code

In code, TMultiplexedProtocol differs from other protocols in that it needs to specify the service name when writing data.

When a thrift client requests a server, it carries the name of the currently requested service, indicating which service method is to be invoked. When the server receives the data, it obtains the corresponding service according to the service name, and then invokes the method of the service, so as to realize the function of multi-service reuse.

conclusion

Data is transmitted in the form of byte stream in network transmission, that is, using binary. The so-called transport protocol is essentially the conversion operation between binary and other data types.

This paper introduces the implementation of several transport protocols provided by THRIFT. In particular, it focuses on the functions of TBinaryProtocol, including int, LONG, String, Boolean, and the conversion between object types and binary.

Then it introduces the function of TCompactProtocol, which can compress integer data, so as to reduce data transmission and improve transmission efficiency.

For the two JSON serialization protocols provided by Thrfit, this paper does not go into the details.

TMultiplexedProtocol is a wrapper protocol that does not implement read and write operations. Its function is that if there are multiple thrift services on a server, it can specify serviceName to determine which service is called.

Limited to the author’s personal level, there are inevitable mistakes in the article, welcome to correct! Original is not easy, commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.

This article was first published in the wechat public number Java technology exploration, if you like this article, welcome to pay attention to the public number, let us explore together in the world of technology!

04. Apache Thrift transport protocol

TBinaryProtocol

Read and writeI32Type data

Read and writeI64data

Read and writebooleanType data

Read and writeStringType data

Read and write complex object data

TCompactProtocol

Read and writei32 Type data

Read and writedoubleType data

Read and writeStringtype

TJSONProtocol andTSimpleJSONProtocol

TMultiplexedProtocol

TMultiplexedProtocolThe sample

TMultiplexedProtocolImplementation principle of