How are objects in the Java realm transferred

Object transfer based on socket

First take a simple example, based on our previous several courses is to write a socket communication code User

public class User { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; }}Copy the code

SocketServerProvider

public static void main(String[] args) throws IOException {
        ServerSocket serverSocket = null;
        BufferedReader in = null;
        try {
            serverSocket = new ServerSocket(8080);
            Socket socket = serverSocket.accept();
            ObjectInputStream objectInputStream = new ObjectInputStream(socket.getInputStream());
            User user = (User) objectInputStream.readObject();
            System.out.println(user);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (in != null) {
                try {
                    in.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if (serverSocket != null) {
                serverSocket.close();
            }
        }
    }
Copy the code

SocketClientConsumer

public static void main(String[] args) { Socket socket = null; ObjectOutputStream out = null; Try {socket = new socket ("127.0.0.1", 8080); User user = new User(); out = new ObjectOutputStream(socket.getOutputStream()); out.writeObject(user); } catch (IOException e) { e.printStackTrace(); } finally { if (out ! = null) { try { out.close(); } catch (IOException e) { e.printStackTrace(); } } if (socket ! = null) { try { socket.close(); } catch (IOException e) { e.printStackTrace(); }}}}Copy the code

The results

: : Once this code is running, is it possible to achieve normal transfer of Java objects? Obviously, an error will be reported

How do you solve the error reporting problem?

Implement a Serializable interface to the User object, and run it again to see that the object can be transferred properly

public class User implements Serializable { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; }}Copy the code

Understand the meaning of serialization

We found that adding Serializable to the User class solves the problem of network transport for Java objects.

The Java platform allows us to create reusable Java objects in memory, but in general these objects are only possible when the JVM is running, that is, they don’t have a lifetime longer than the JVM’s lifetime. In a real-world application, however, you might want to be able to save (persist) the specified object after the JVM stops running and re-read the saved object at a later date. Java object serialization helps us do this.

In simple terms

  • Serialization is the process of converting the state information of an object into a form that can be stored or transmitted, that is, the process of converting an object into a sequence of bytes is called object serialization
  • Deserialization is the reverse process of serialization. Deserialization of a byte array into an object, and deserialization of a byte sequence into an object is deserialization of an object

Advanced knowledge of serialization

Just a quick overviewJavaNative serialization

Are demonstrated in the previous code, how to use the JDK provides a Java object serialization way object serialization transmission, mainly through the output stream. Java IO. ObjectOutputStream and object input stream Java. IO. ObjectInputStream.

  • java.io.ObjectOutputStream: represents the output stream of the object, itswriteObject(Object obj)Method can be specified for argumentsobjObject to serialize the resulting byte sequence to a target output stream.
  • java.io.ObjectInputStream: represents the object input stream, itsreadObject()Method source input stream reads byte sequences, deserializes them into an object, and returns them.

Note that serialized objects need to implement the Java.io.Serializable interface

serialVersionUIDThe role of

inIDEACan be generated through the following Settingsserializeid Literally the version number of serialization, as implementedSerializableEach class of an interface has a static variable that represents the serialized version identifier

Demonstrate the steps

  1. First theuserObject serialized to a file
  2. Then changeuserObject, addserialVersionUIDfield
  3. The object is then extracted by deserialization
  4. Demonstrate expected result: prompt cannot be deserialized

conclusion

Java’s serialization mechanism verifies version consistency by determining the serialVersionUID of a class. During deserialization, the JVM will compare the serialVersionUID in the byte stream sent from the JVM with the serialVersionUID of the corresponding local entity class. If the serialVersionUID is the same, it is considered the same and can be deserialized. Otherwise, the serialVersionUID will be inconsistent. Is InvalidCastException.

As a result, the class in the file stream and the class in the classpath, the modified class, are incompatible. For security reasons, the program throws an error and refuses to load. From the error results, if the serialVersionUID is not configured for the specified class, the Java compiler will automatically give the class a summary algorithm, similar to the fingerprint algorithm, and the resulting UID will be completely different whenever the file is changed. It is guaranteed that this number is unique among these classes. So, since serialVersionUID is not explicitly specified, the compiler generates another UID for us, which of course is not the same as the one previously saved in the file, resulting in two inconsistent serialization versions. Therefore, as long as we specify the serialVersionUID ourselves, we can add a field or method after serialization without affecting the later restoration, and the restored object can still be used, but also more methods or properties can be used.

Tips: serialVersionUID has two ways to generate the display:

  • The first is the default 1L, for example:private static final long serialVersionUID = 1L;
  • The second is to generate a 64-bit hash field based on the class name, interface name, member methods and attributes

When a Class implementing the java.io.Serializable interface does not explicitly define a serialVersionUID variable, the Java serialization mechanism automatically generates a serialVersionUID based on the compiled Class for version comparison. In this case, if the Class file (Class name, method name, etc.) has not changed (adding Spaces, newlines, comments, etc.), the serialVersionUID will not change even if it is compiled multiple times.

TransientThe keyword

The function of the Transient keyword is to control the serialization of variables. Adding the keyword before the variable declaration can prevent the variable from being serialized to the file. After deserialization, the value of the Transient variable is set to its initial value, such as 0 for int and NULL for object.

bypasstransientInstitutional approach

Although name is transient, the name field can still be serialized and deserialized correctly through the two methods we wrote

writeObjectreadObjectThe principle of

writeObjectreadObjectAre two private methods. When are they called? From the run results, it does get called. And they don’t existJava.lang.ObjectAnd not inSerializableTo declare.

Our only guess would be that alpha and betaObjectInputStreamObjectOutputStreamOkay, so based on this entry let’s see where the call is made, okayFrom the source level analysis,readObjectIt’s called by reflection.

We can actually see it in a lot of placesreadObjectwriteObjectFor exampleHashMap.

JavaA quick summary of serialization

  1. JavaSerialization only stores the state of the object, not the methods in the object
  2. When a parent class implements serialization, the subclass automatically implements serialization without the need to display the serialization interface
  3. When an instance variable of an object references another object, serialization of the object will automatically serialize the referenced object.
  4. When a field is declared astransientThe default serialization mechanism ignores this field
  5. Be stated astransientIf serialization is required, you can add two private methods:writeObject And readObject

Common serialization techniques in distributed architectures

With a primer on Java serialization, it’s time to go back to the distributed architecture and see how serialization evolves

Understand the evolution of serialization

With the popularity of distributed architecture and microservice architecture. Communication between services becomes a basic requirement. At this time, we need to consider not only the communication performance, but also the language diversification problem. Therefore, for serialization, how to improve the serialization performance and solve the cross-language problem has become a key consideration.

There are two problems with the serialization mechanism provided by Java itself

  1. The serialized data is large and the transmission efficiency is low
  2. Other languages cannot be recognized and docked

So that in a long period of time, object serialization mechanism based on XML format encoding has become the mainstream, on the one hand, it solves the problem of multi-language compatibility, on the other hand, it is easier to understand than binary serialization. So that XML-based SOAP protocol and the corresponding WebService framework in a long period of time has become a necessary technology of every mainstream development language. Later, THE SIMPLE text format encoding HTTP REST interface based on JSON basically replaced the complex Web Service interface and became the primary choice for remote communication in distributed architecture. However, JSON serialized storage takes up large space and low performance, and mobile client applications need to transmit data more efficiently to improve user experience. In this case, language-independent and efficient binary coding protocol becomes one of the hot technologies pursued by everyone. An open source binary serialization framework was first born -MessagePack. It predates Google’s Protocol Buffers.

A brief understanding of various serialization techniques

Introduction to XML serialization framework

The benefits of XML serialization are readability, ease of reading, and debugging. However, serialized bytecode files are relatively large and inefficient, which is suitable for scenarios of data exchange between enterprise internal systems with low performance and low QPS. Meanwhile, XML has language independence, so it can also be used for data exchange and protocol between heterogeneous systems. For example, we are familiar with WebServices, which serialize data in XML format. XML serialization/deserialization can be implemented in many ways, including XStream and Java’s XML serialization and deserialization

JSONSerialization framework

JSON (JavaScript Object Notation) is a lightweight data interchange format that has a smaller byte stream than XML and is very readable. JSON data format is now the most common JSON serialization in the enterprise and there are many open source tools available

  1. Jacksonhttps://github.com/FasterXML/jackson)
  2. Ali open sourceFastJsonhttps://github.com/alibaba/fastjon)
  3. GoogleGSON (https://github.com/google/gson)

Of these json serialization tools, Jackson and FastJSON have better performance than GSON, but Jackson and GSON have better stability than FastJSON. The advantage of Fastjson is that it provides an API that is very easy to use

Hessian serialization framework

Hessian is a binary serialization protocol that supports cross-language transfer. Compared to the Java default serialization mechanism, Hessian has better performance and ease of use, and supports many different languages. But Dubbo has refactored Hessian for better performance

Avroserialization

Avro is a data serialization system designed for applications that support large volume data exchange. Its main features are: support binary serialization, can be convenient, fast processing of a large number of data; Dynamic languages are friendly, and Avro provides mechanisms that make it easy for dynamic languages to process Avro data.

kyroSerialization framework

Kryo is a very mature serialization implementation that is already widely used in Hive and Storm, but it does not cross languages. Dubbo now supports kyro serialization in version 2.6. It performs better than the previous Hessian2

ProtobufSerialization framework

Protobuf is a Google data exchange format that is language – and platform-independent. Google provides multiple languages to implement, such as Java, C, Go, Python, each implementation contains the corresponding language compiler and library files, Protobuf is a pure presentation layer protocol, can be used with various transport layer protocols.

Protobuf is widely used because of its low space overhead and high performance. It is ideal for RPC calls that require high performance within a company. In addition because analytical performance is high, after the serialized data volume is relatively small, so can be applied in the object persistence scenario, but using Protobuf can relatively trouble, because he has his own syntax, has its own compiler, if need to go to the input cost in learning this technology. A drawback of protobuf is that for each class structure being transferred, a proto file is generated, and if a class is modified, the proTO file for that class must be regenerated

ProtobufPrinciple of serialization

Protobuf’s serialization principle, which has the advantages of low space cost and relatively good performance. Some of the algorithms that it uses are worth learning

protobufBasic application of

The general steps for development using Protobuf are

  1. Configure the development environment and install itprotocol compilerCode compiler
  2. write.protoFile that defines the data structure of the serialized object
  3. writt-based.protoFile, usingprotocol compilerThe compiler generates the corresponding serialization/deserialization utility classes
  4. Write your own serialization application based on automatically generated code

ProtobufCase presentation

Download the protobuf tool https://github.com/google/protobuf/releases protoc – 3.5.1 track of – win32. Zip

Write proto files

syntax="proto2";
package com.gupaoedu.serial;
option java_package =  "com.gupaoedu.serial";
option java_outer_classname = "UserProtos";
message User {
    required string name = 1;
    required int32 age = 2;
}
Copy the code

Data type: String/bytes/bool/int32 (4 bytes) /int64/float/double/ enum Enum class /message user-defined class

Required /optional /repeated/set 1, 2, 3, and 4 must be unique in the current circle

Reference POM coordinates

< the dependency > < groupId > com. Google. Protobuf < / groupId > < artifactId > protobuf - Java < / artifactId > < version > 3.7.0 < / version > </dependency>Copy the code
UserProtos.User  user=UserProtos.User.newBuilder().setName("Mic").setAge(18).build();
ByteString bytes=user.toByteString();
System.out.println(bytes);
UserProtos.User nUser=UserProtos.User.parseFrom(bytes);
System.out.println(nUser);
Copy the code

Protobuf serialization principle

We can print the serialized data and see the results

public static void main(String[] args) { UserProtos.User user = UserProtos.User.newBuilder(). setAge(300).setName("Mic").build(); byte[] bytes = user.toByteArray(); for (byte bt : bytes) { System.out.print(bt + " "); }}Copy the code

Running results: 10 3 77 105 99 16-84 2

As you can see, the serialized numbers are basically incomprehensible, but the serialized data is really small, so let’s take you to understand the underlying principle.

Protobuf uses two compression algorithms, varint and Zigzag, to minimize serialization

varint

Let’s start with the first oneage=300How is this number compressedThe result of these two bytes is -84 and 2-84. We know that the way to represent negative numbers in binary is to set the high value to 1, and then compute the complement representation after taking the inverse of the corresponding binary (the complement is inverse +1).

So if you want to do it the other way around

  1. 10101100-1 is 10101011
  2. [inverse code] 01010100 results in 84. Since the high level is 1, indicating a negative number, the result is -84

How are characters converted to codes

The Mic character needs to be converted to a number according to the ASCII comparison table. M =77, I =105, and c=99, so this is 77, 105, 99, and you have to wonder, why is this just the ASCII value? Why is it not compressed? The reason is that varint is a compression of the bytecode, but if the binary of the number requires only one byte, the resulting encoding will not change

There are two more numbers. What are 3 and 16? So you need to understand the storage format of protobuf

Storage format

protobufusingT-L-VAs storage Calculation of the tag is field_number (number) of the current field < < 3 | wire_type

Mic field Numbers is 1, for example, type wire_type value of 2 so: 1 < < 3 | 2 = 10

Type number is 2, age = 300 field wire_type value is 0, so: 2 < < 3 | 0 = 16

The first number, 10, is key, and the rest are values.

Storage of negative numbers

In computers, negative numbers are represented as large integers, because the computer defines the sign bit of a negative number as the highest digit, so if you use varint to represent a negative number, you must have five bits. So in Protobuf, negative numbers are represented by sint32/sint64, which is handled by zigZag coding (converting signed numbers to unsigned numbers) and varint coding.

Sint32: (n << 1) ^ (n >> 31) sint64: (n << 1) ^ (n >> 63) 1110 1101 0100 n<<1: 0 -> 1101 1010 1000 n>>31: N <<1 ^ n >>31 1101 1010 1000 ^ 1111 1111 1111 1111 = 0010 0101 0111 decimal: 0010 0101 0111 = 599 VarINT algorithm: from the right, select 7 bits, fill the high order 1/0 (depending on the number of bytes) to get two bytes 1101 0111 0000 0100-41, 4

conclusion

The good performance of the Protocol Buffer is mainly reflected in the small volume of serialized data & fast serialization speed, which ultimately leads to high transmission efficiency. The reasons for fast serialization are as follows:

  • A. Simple encoding/decoding (only simple math = displacement, etc.)
  • B. Using the framework code of the Protocol Buffer itself and the compiler

The reasons for the small volume of serialized data (i.e., good data compression) :

  • A. Adopt unique coding methods, such as Varint, Zigzag and so on
  • B. Use t-L-V data storage: reduce the use of separators & compact data storage

Selection of serialization technology

The technical level

  1. Serialization space overhead, which is the size of the results of serialization, affects transport performance
  2. Duration of serialization. A long serialization duration affects the service response time
  3. Whether the serialization protocol supports cross-platform, cross-language. Because today’s architectures are more flexible, this must be considered if there is a need for heterogeneous systems to communicate
  4. Scalability/compatibility, in the actual business development, the system often need as demand rapid iteration to achieve quick update, this requires that we adopt serialization protocol based on good scalability/compatibility, such as in the existing serialized data structure, a new business field in does not affect the existing services
  5. The popularity of technology, the more popular technology means the use of more companies, so many potholes have been poured and solved, technical solutions are relatively mature
  6. Learning difficulty and ease of use

Selection Suggestions

  1. The XML-based SOAP protocol can be used in scenarios that do not require high performance
  2. Hessian, Protobuf, Thrift, and Avro can be used for scenarios that require high performance and indirectness.
  3. Based on the front and back end separation, or independent external API services, choose JSON is better, for debugging, readability is very good
  4. Avro is designed to be a dynamically typed language, so this kind of scenario is ok with Avro

The performance comparison of the serialization technology This address is for different serialization technology performance comparison: https://github.com/eishay/jvm-serializers/wiki