⚠️ This article is the first contract article in the nuggets community. It is prohibited to reprint without authorization


cause

Today, I need to make a very small but very important system change, which is to add a serialization interface to the RPC interface outgoing parameter of the core interface (as you can see below, the original entity class is not serialized).


Coding, testing and code review were completed in one go, and then I received a rejection notice. The architect said that when implementing serialization interface, do not forget to configure serialversionUID. He also very kindly told me that IDEA has a plug-in that can automatically generate UID. Recommended I download and use (IDEA serialversionUID plug-in address), according to the requirements after adjustment, test, compile, release in one go, into today’s nap mode (😎)


Dream cry

I suddenly dream of enterprise wechat to pop up a window every millisecond speed kept flashing, around the bustling people, face worried, do not know what to say…

Something wrong with the line? What does it have to do with me (🤪) Is certainly not my problem, but to be on the safe side, let’s recall what we did today.

What did ** do? ** The middle platform system is online. What has ** changed? ** Added serialization interface to some classes and added serialversionUID… What does that lead to? Interface call failed… COE…

I immediately woke up from my dream and began to look at the wechat of the enterprise, the monitoring and the interface availability. After seeing all the data was normal, I gradually felt at ease.


What? We don’t serialize in Java, right?

Reviewing what I know about serialization, and opening up various articles on serialization, leads me to one answer: My changes will definitely affect serialization, as shown in the following example.

Exception in thread "main" java.io.InvalidClassException: ser.demo.StuDemo; local class incompatible: stream classdesc serialVersionUID = 6395135316924936201, local class serialVersionUID = 1
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
	at ser.demo.App.main(App.java:27)
Copy the code

There’s only one possibility: our RPC framework doesn’t use native serialization. After consulting the architect, as I expected, I also learned several other serialization methods from the architect, such as MessagePack, Hessian, and so on.


Common serialization methods

Java serialization

Java serialization converts an object into a binary representation of an array of bytes, which can be stored or transferred for persistence.

To implement serialization, you need to implement java.io.Serializable interface. Deserialization is the opposite process of serialization, which is the process of converting binary array into object. The Serializable interface identifies whether serialization is implemented and whether objects are consistent.

Serialization: Java. IO. ObjectOutputStream# writeObject0

Deserialization: Java. IO. ObjectInputStream# readObject0

Taking the test class (StuDemo) as an example, the serialization results are as follows:

/ / the serialization
FileOutputStream fos = new FileOutputStream("C:\\Users\\Kerwin\\Desktop\\log\\object.out");
ObjectOutputStream oos = new ObjectOutputStream(fos);
StuDemo demo = new StuDemo("Kerwin");
oos.writeObject(demo);
oos.flush();
oos.close();
   
// The result is as follows
//  srser.demo. StuDemoX??? Reach L namet Ljava/lang/String; xpt Kerwin
Copy the code

It’s a bunch of gibberish, but you can still see that the content of the file roughly points to a certain class, what fields there are, the corresponding values, and so on.


MessagePack serialization

MessagePack(Msgpack) is an efficient binary serialization format that lets you exchange data between languages just like JSON, but it’s faster and smaller than JSON.

Faster and smaller equals better performance. How does it work?

When Msgpack is serialized, the fields are not marked with keys, but are stored in the order of the fields. Like arrays, it is encoded as type + length + content, as shown below:

This efficient way of coding imposes some limitations, such as:

  • The server cannot add fields anywhere because deserialization will fail if the client does not upgrade
  • You cannot use the collective-class toolkit provided by third-party packages as the return value

The usage is as follows:

// StuDemo needs to be serialized by MessagePack with the @message annotation
// The MessagePack serialization method does not need to rely on Serializable
public static void main(String[] args) throws IOException {
    StuDemo demo = new StuDemo("Kerwin");
    MessagePack pack = new MessagePack();

    / / the serialization
    byte[] bytes = pack.write(demo);

    // deserialize
    StuDemo res = pack.read(bytes, StuDemo.class);
    System.out.println(res.getName());
}
Copy the code

PS: Our COMPANY’s RPC framework currently uses MessagePack serialization method, which is also because of this, so no problem occurred during the above adjustment of serialVersionUID. Similarly, the serialVersionUID is subject to the limitations of the underlying serialization, which are also explicitly mentioned in our new document. You have to add fields at the end and so on.


Hessian2 serialization

Hessian is dynamically typed, binary, compact, and portable across languages as a serialization framework. Based on Hessian, the performance and compression of Hessian2 are greatly improved.

Hessian stores all attributes of a complex object in a map-like structure for serialization. Therefore, if a member variable with the same name exists in the parent class or subclass, Hessian serializes the subclass first and then the parent class. As a result, the value of the member variable with the same name in the subclass is overwritten by the parent class.

It has eight core design goals.

  • Serialization types must be self-described, that is, no external schema or interface definition is required
  • Must be language neutral, including supporting scripting languages
  • Must be readable or writable in a pass
  • Must be as compact (compressed) as possible
  • Must be simple
  • As soon as possible.
  • Unicode strings must be supported
  • Must support 8-bit binary data
  • Encryption must be supported

The usage is as follows:

public class StuHessianDemo implements Serializable {

    private static final long serialVersionUID = -640696903073930546L;

    private String name;

    public StuHessianDemo(String name) {
        this.name = name;
    }

    public String getName(a) {
        return name;
    }

    public void setName(String name) {
        this.name = name; }}Copy the code
public static void main(String[] args) throws IOException {
    StuHessianDemo hessianDemo = new StuHessianDemo("Kerwin");

    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    HessianOutput hessianOutput = new HessianOutput(stream);
    hessianOutput.writeObject(hessianDemo);

    ByteArrayInputStream inputStream = new ByteArrayInputStream(stream.toByteArray());

    // Hessian deserializes the read object
    HessianInput hessianInput = new HessianInput(inputStream);
    System.out.println(((StuHessianDemo) hessianInput.readObject()).getName());
}

// Result: Kerwin
Copy the code


Basis of selection

For example, MessagePack is extremely compressed and fast, while Hessian2 relies on Serializable interface to pursue space utilization and efficiency as far as possible on the basis of ensuring security and self-description. However, Java serialization method has been criticized for being difficult and elegant. Therefore, when RPC framework chooses the underlying serialization method, it needs to select a serialization method according to its own needs.

The selection is based on the following, from highest to lowest priority:


A little thought

Status of JSON serialization

JSON serialization is the most familiar serialization method, and it doesn’t require Serializable interface. Why isn’t it the default serialization method used by most RPC frameworks?

After knowing the content of the above, we know that the KEY is in the performance, efficiency, space overhead, because the JSON serialization framework is a kind of text type, the KEY – the VALUE of the memory to store data, it is in the serialization of extra space overhead is relatively more, not to mention during deserialization, need to rely on reflection, so the lower performance.

However, JSON itself is so readable that it serves as the de facto standard for HTTP on the Web.


Why customize the serialVersionUID

There’s a quote in Effect Java that says:

No matter which serialization method you choose, declare an explicit sequence version UID for each serializable class you write.

Why did the architect remind me to implement it? Why does it say that in the book?

SerialVersionUID decomposes to the full name: Serial Version UID. Each serializable class has an explicit identifier specified in the long field. If not defined by the coder, the system uses an encrypted hash function (SHA-1) on the structure of the class to automatically generate the identifier at runtime. This number is affected by the class name, interface name, public and protected member variables, and any changes such as adding an unimportant public method can affect the UID and cause exceptions.

So it’s a matter of habit and avoiding potential risks.


conclusion

Now that we’ve learned how impractical Java serialization is (to the point of ridicule) and some of the underlying secrets of framework usage considerations (such as MsgPack adding fields), here are some tips for serialization:

  1. Whether dependent or notSerializable, interface outgoing parameters are recommended to implement serialization interface.
  2. If you implement a serialization interface, be sure to implement your own serialVersionUID.
  3. Interfaces should not use special data types (such as MsgPack third-party collections), too complex structures (inheritance, etc.), otherwise it will lead to a lot of puzzling problems.
  4. When inconsistent server/client data occurs, the first thought is serialization problem, and according to the characteristics of the current serialization method, careful troubleshooting.

If you find this helpful:

  1. Of course, give me a thumbs up
  2. In addition, you can search and follow the public account “Yes Kerwin ah,” and go on the road of technology together ~ 😋


The resources

  1. MsgPack official website
  2. The Effect of Java