This article was adapted from: Lebyte

This article mainly explains: Java common serialization framework

For more JAVA related information, please follow the public account “LeByte” to send: 999

I. Background introduction

Serialization and deserialization are commonly used in our daily data persistence and network transmission, but various serialization frameworks are dazzling at present, and it is not clear which serialization framework is used in which scenario. This paper will conduct comparative tests on the open source serialization frameworks in the industry from five aspects of generality, ease of use, scalability, performance, data type and Java syntax support.

The following compare JDK Serializable, FST, Kryo, Protobuf, Thrift, Hession, and Avro respectively.

Serialization framework

1 JDK Serializable

JDK Serializable is a Java serialization framework, we only need to implement java.io.Serializable or java.io.Externalizable interface, can use Java serialization mechanism. Implementing the serialization interface simply means that the class can be serialized/deserialized. We also need to serialize and deserialize objects with ObjectInputStream and ObjectOutputStream for I/O operations.

generality

The Java built-in serialization framework itself does not support cross-language serialization and deserialization.

Ease of use

As Java’s built-in serialization framework, unordered references to any external dependencies accomplish serialization tasks. However, JDK Serializable is much more difficult to use than open source frameworks. As you can see, the above codecs are very rigid and require ByteArrayOutputStream and ByteArrayInputStream to complete byte conversion.

scalability

JDK version by serialVersionUID control class Serializable, if the serialization and deserialization versions, will throw Java. IO. InvalidClassException abnormal information, prompt the serialization and deserialization SUID inconsistencies.

performance

The JDK Serializable is a built-in serialization framework for Java, but it’s not a natural one in terms of performance. The following test case is a test entity that we run through the whole article. We serialized the test case 10 million times and calculated the total time:

Again, we’ll compare with other serialization frameworks later.

Data type and syntax structure support

Since JDK Serializable is a native serialization framework for Java syntax, Java data types and syntax are generally supported.

WeakHashMap does not implement the Serializable interface.

2 FST serialization framework

FST(fast-Serialization) is a Java serialization framework that is fully compatible with JDK serialization protocol. Its serialization speed can reach 10 times that of JDK, and the serialization result is only 1/3 that of JDK. FST is currently available in version 2.56, with Android support available after version 2.17.

generality

FST is also a serialization framework developed for Java, so there are no cross-language features.

Ease of use

In terms of ease of use, FST can be said to be several blocks ahead of JDK Serializable. The syntax is extremely concise, and FSTConfiguration encapsulates most methods.

scalability

FST supports compatibility of new fields with older data streams through the @version annotation. All new fields need to be identified by the @version annotation. No Version annotation means Version 0.

Note:

Overall, although FST supports scalability, it is still cumbersome to use.

performance

Using FST to serialize the above test case, the serialized size was: 172, nearly a third less than the JDK serialized size of 432. Now let’s look at the time cost of serialization versus deserialization.

Data type and syntax structure support

FST is based on the JDK serialization framework, so its data types and syntax are consistent with Java support.

3 Kryo serialization framework

Kryo is a fast and efficient Java binary serialization framework, which relies on the underlying ASM library for bytecode generation, so it has a good run speed. The goal of Kryo is to provide a serialization framework with fast serialization, small results, and an easy-to-use API. Kryo supports automatic deep/shallow copy, which is a process that goes directly through object -> deep copy of objects, rather than object -> byte -> objects.

generality

First of all, Kryo’s official website says that it is a Java binary serialization framework. Secondly, I searched online and found no cross-language use of Kryo, but some articles mentioned that cross-language use is very complicated, but I did not find relevant implementation of other languages.

Ease of use

In terms of usage, the API provided by Kryo is also very simple and easy to use, with Input and Output encapsulating almost every stream operation you can think of. Kryo offers a wealth of flexible configurations, such as customizing serializers, setting default serializers, and so on, that can be quite taxing to use.

scalability

Kryo default serializer FiledSerializer does not support field extensions. If you want to use the extension serializer, you will need to configure another default serializer.

performance

Using Kryo to test the above test case, the serialized byte size of Kryo is 172, which is consistent with the unoptimized size of FST. The time cost is as follows:

We also turn off cyclic reference configuration and pre-registration of serialized classes with a size of 120 bytes, because the class serialization is identified by the number used instead of the class’s full name. The time overhead used is as follows:

Data type and syntax structure support

One of Kryo’s basic requirements for serialized classes is to have a no-argument constructor because it is used to create objects during deserialization.

4 Protocol buffer

The Protocol Buffer is a language-neutral, platform-independent, extensible serialization framework. Compared to the previous serialization frameworks, the Protocol Buffer requires a predefined Schema.

generality

Protobuf was originally designed to be a language-independent serialization framework. It currently supports Java, Python, C++, Go, C#, and many other languages provide third-party packages. So protobuf is very powerful in terms of versatility.

Ease of use

Protobuf requires IDL to define the Schema description file. After defining the description file, we can use ProtoC directly to generate serialization and deserialization code. Therefore, protobuf can be used by simply writing a description file.

scalability

Extensibility was also one of the goals of Protobuf from the beginning, and we can easily make changes in.proto files. New fields: For new fields, make sure they have default values so that they can interact with old code. Messages generated by the corresponding new protocol can be parsed by the old protocol. Deleting a field: When deleting a field, note that the corresponding field and label cannot be used in subsequent updates. To avoid errors, we can use reserved to avoid strips.

Protobuf is also data compatibility friendly, int32, unit32, INT64, UNIT64, bool are fully compatible so we can change their types as needed. From above, Protobuf does a lot of extensibility and supports protocol extensions very nicely.

performance

We also use the above example for performance testing, using a protobuf serialized size of 192 bytes, the time overhead is shown below.

It can be seen that the deserialization performance of Protobuf is worse than FST and Kryo.

Data type and syntax structure support

Protobuf uses IDL to define Schema and therefore does not support Java method definition.

A List, Set, and Queue are defined by protobuf repeated tests. Any class that implements the Iterable interface can use repeated lists.

5 Thrift serialization framework

Thrift is an efficient, multilingual framework for Remote Procedure Call (RPC) implemented by Facebook. Facebook later open-source Thrift to Apache. You can see that Thrift is an RPC framework, but because Thrift provides RPC services between multiple languages, it is often used in serialization.

There are three steps to implement Thrift serialization: create Thrift IDL files, compile and generate Java code, and use TSerializer and TDeserializer for serialization and deserialization.

generality

Thrift, like Protobuf, requires the use of IDL to define a description file, which is currently an effective way to implement cross-language serialization /RPC. Thrift currently supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, node.js, Smalltalk, OCaml, Delphi, etc. So you can see that Thrift is very universal.

Ease of use

Thrift is similar to Protobuf in ease of use in that it requires three steps: writing Thrift files using IDL, compiling and generating Java code, and calling serialization and deserialization methods. Protobuf already has built-in serialization and deserialization methods in the generated classes, whereas Thrift requires a separate call to the built-in serializer for encoding and decoding.

scalability

Thrift supports field extension. Note the following during field extension:

performance

In the above test case, the size of bytes after Thrift serialization is: 257. Here are the corresponding serialization time and deserialization time costs:

Thrift is about the same as Protobuf in terms of the sum of serialization and deserialization time, protobuf has an advantage in serialization time, and Thrift has an advantage in deserialization.

Data type and syntax structure support

Data type support: Since Thrift uses IDL to define serialized classes, the only data types that can be supported are Thrift data types. Java data types supported by Thrift:

Thrift also does not support defining Java methods.

6 Hessian serialization framework

Hessian is a lightweight Remote Procedure Call (RPC) framework developed by Caucho. It uses HTTP protocol for transport and binary serialization using Hessian. Hessian is often used in serialization frameworks because of its support for a cross-language, efficient binary serialization protocol. Hessian serialization protocol is divided into Hessian1.0 and Hessian2.0. Hessian2.0 protocol optimizes the serialization process (optimization content to be seen) and has significantly improved performance compared with Hessian1.0. Using Hessian serialization is very simple, just need to HessianInput and HessianOutput to complete the object serialization, the following is the Hessian serialization Demo:

generality

Like Protobuf and Thrift, Hessian supports cross-language RPC communication. One of the main advantages of Hessian over other cross-language PRC frameworks is that instead of defining data and services in IDL, Hessian defines services in a self-describing way. Hessian has been implemented in Java, Flash/Flex, Python, C++,.NET /C#, D, Erlang, PHP, Ruby, object-C.

Ease of use

Compared to Protobuf and Thrift, Hessian is easier to use because it does not require IDL to define data and services and only implements Serializable interface for serialized data.

scalability

The Hession serialized class needs to implement the Serializable interface, but it is not affected by serialVersionUID and can easily support field extensions.

performance

The above test case was serialized using the Hessian1.0 protocol, resulting in a size of 277. Using the Hessian2.0 serialization protocol, the serialization result size is 178.

The time cost of serialization and deserialization is as follows:

It can be seen that Hessian1.0 is much different from Hessian2.0 in both volume after serialization and serialization and deserialization time.

Data type and syntax structure support

Hession uses Java self-describing serialized classes, so Java native data types, collection classes, custom classes, and enumerations are supported by Hession (which is not supported by SynchronousQueue), as well as Java syntax structures.

7 Avro serialization framework

Avro is a data serialization framework. It is a subproject of Apache Hadoop, a data serialization framework developed during the Hadoop process by Doug Cutting. Avro was designed to support data-intensive applications, making it ideal for remote or local large-scale data exchange and storage.

generality

Avro defines data structures through Schema and currently supports Java, C, C++, C#, Python, PHP, and Ruby, so Avro has great versatility among these languages.

Ease of use

Avro doesn’t need to generate code for dynamic languages, but for static languages like Java, you still need to use Avro-tools.jar to compile and generate Java code. Schema writing is personally more complex than Thrift and Protobuf.

scalability

performance

The result of serialization using Avro generated code is: 111. Here is the time cost of serialization with Avro:

Data type and syntax structure support

Avro needs to use Avro’s supported data types to write Schema information, so the supported Java data types are Avro’s supported data types. Avro supports basic data types (NULL, Boolean, int, Long, float, double, bytes, string) and complex data types (Record, Enum, Array, Map, Union, and Fixed).

Avro automatically generates code, or uses schemas directly, and does not support defining Java methods in serialized classes.

Three summary

1 general

Comparing each serialization framework in terms of generality, Protobuf is the best in terms of generality, supporting multiple major conversion languages.

2 ease of use

The following is a comparison of the serialization frameworks in terms of the ease of use of their apis. Apart from the JDK Serializer, all serialization frameworks provide a good way to use their apis.

3 Scalability

Here is a comparison of the extensibility of each serialization framework. Protobuf’s extensibility is the most convenient and natural. Other serialization frameworks require some configuration, annotations, and so on.

4 performance

Serialization size comparison

Comparing the serialized data sizes of the various serialization frameworks below, you can see that kryo preregister and Avro serialized very well. So, if you have a serialization size requirement, you can choose Kryo or Avro.

Serialization time cost comparison

Kryo preregister and FST preregister both provide excellent performance. FST Pre has the best serialization time, while Kryo Pre has roughly the same serialization and deserialization time. So, if serialization time is a major consideration, choose Kryo or FST, both of which provide a good performance experience.

5 Data type and syntax structure support

Comparison of Java datatypes supported by serialization frameworks:

Note: Collection type tests basically cover all corresponding implementation classes.

The following is a summary of the data types and syntax supported by the above serialization framework based on the tests.

Since Protobuf and Thrift are IDL defining class files, they then use their respective compilers to generate Java code. IDL does not provide syntax for defining STAIC inner classes, non-static inner classes, etc., so these features cannot be tested.

Thank you for your recognition and support, xiaobian will continue to forward “LeByte” quality articles