1. Introduction

Gson is a Java platform Json library for serialization and deserialization of Json.

Practice has found that the performance of converting large JSON strings to instances is not very good.

This article will explore the reasons why Gson takes time to deserialize JSON from a source code perspective, and provide a general solution.

2. Too long to look at

There are three reasons for the time consuming of Gson deserialization (decreasing impact size) :

  1. The defaultTypeAdapterAll fields of the class and its superclass are reflected and traversed, and a Map table is generated;
  2. The defaultTypeAdapterReflection is used in setValue;
  3. The defaultTypeAdapterReflection is used to create the instance.

The solution is to write TypeAdapter.

Writing TypeAdapter is cumbersome, however, so you can consider two ways (or better) to generate code:

  1. The annotation handler (KAPT) generates code.

    Advantages: Easy access

    Disadvantages: Some kotlin code features (such as generics nullability) are not available during Kapt; Intrusion into compile time.

  2. Statically analyze the code syntax tree and generate code.

    Advantages: Full support for Kotlin language features; Zero intrusion into project code and compile time.

    Disadvantages: A copy must be adapted for each language (different languages have different syntax trees).

3. Basic usage

To do a good job, he must sharpen his tools. First, you need to understand how to use Gson.

For the following class:

To generate an instance of Foo from a JSON string, call the gson.fromjson method

Gson also supports generics:

4. Source code interpretation

Before analyzing the Gson source code, we can make a general guess about the Gson deserialization process and the serialization process.

Gson deserialization might look like this:

  1. Create an instance based on type reflection;
  2. Read the JSON string in key-value form and reflect the setting value;
  3. Return this instance.

This is just a rough guess, specific or to see the source code, verify or overturn our guess.

4.1. Prior knowledge

Before reading Gson source, need to know some pre-knowledge, so that after we read the source code, Gson source can have a more clear, three-dimensional understanding.

4.1.1. TypeAdapter

TypeAdapter is a generic interface to fromJson and toJson within Gson, which calls the read method to convert JSON strings into instances.

Return type T, representing fromJson return;

JsonReader can stream json as key-value.

There are many types of TypeAdapter built into Gson, which Gson uses by default for fromJson and toJson:

4.1.2. TypeAdapterFactory

TypeAdapterFactory is the Factory class of TypeAdapter that creates a TypeAdapter of the specified type.

In the Gson instance, the TypeAdapterFactory is stored in a List called factories:

4.2. Parsing Json

TypeAdapterFactory and TypeAdapter are important parts of fromJson. Let’s start with the entry:

This is how we normally call Gson to parse Json.

Click on the source code to trace the way to the fromJson overload method:

FromJson can be roughly divided into two steps:

  1. Obtained based on TypeTokenTypeAdapter;
  2. Call TypeAdapter’s read method to generate an instance and return it.

First let’s look at how getAdapter gets the TypeAdapter.

4.2.1. Parse Json- Obtain TypeAdapter

You can see that the logic is roughly divided into two parts:

  1. Try fetching it from the cache typeTokenCache;
  2. Linear search in Fatories according to typeToken.

First it tries to fetch it from the cache; If it is not in the cache, the getAdapter iterates over every element of the Factories: if the TypeAdapterFactory can create a TypeAdapter of that type (typeToken), it returns a non-empty TypeAdapter, otherwise it returns empty.

Because here is to use the default Gson instance, not Foo this type of corresponding TypeAdapter, so eventually getAdapter returns a ReflectiveTypeAdapterFactory instance (the last element of factories), Use it to create a TypeAdapter.

ReflectiveTypeAdapterFactory the create method returns an inner class instance (ReflectiveTypeAdapterFactory. Adapter), is TypeAdapter.

You can see that the second argument to the Adapter constructor calls the getBoundFields method, which is where the time is:

Where type is a type with generics, raw is a type without generics.

Such as:

type type raw
Int Int Int
List<Int> List<Int> List
List<List<Int>> List<List<Int>> List

After reading the code, it’s not hard to see that getBoundFileds does a few things:

  1. Iterate over all fields of the current class (RAW) and its superclass;
  2. For each field, a BoundFiled is created;

BoundFiled is the encapsulation of fields and provides read and write capabilities for fields.

This method is time consuming the first time it is called because of the amount of reflection logic involved.

Then look at ReflectiveTypeAdapterFactory. The read method of Adapter:

You can clearly see that the read method has roughly two steps

  1. Create an instance;
  2. Iterate over the KEY-value of the JSON string and set the value.

4.2.2. Parse Json- Create instance

Let’s look at how Gson constructs an instance of Foo.

Constructor can generate the corresponding types of instances, it is instantiated ReflectiveTypeAdapterFactory. Incoming Adapter.

In ReflectiveTypeAdapterFactory by ConstructorConstructor. Get to create a constructor.

See ConstructorConstructor. Get the internal dry what:

Case 1:

Situation 2:

Case 3:

Situation 4:

Situation 5:

Gson internally instantiates an object in four general ways that cover all cases:

  1. Use the pre-set instance constructor to construct the instance, corresponding to case 1 and case 2.
  2. Construct an instance with no arguments of the class, corresponding to case 3;
  3. Construct an instance of a collection type, corresponding to case 4;
  4. Unsafe If none of the above is possible, use a bottom-of-the-pocket policy and use unsafe to construct an instance directly. For instance 5, broadening is unsafe.

In this case, instance Foo is created for case 3 (Foo has a no-argument constructor when all fields of Foo have default values when compiled into Java code).

4.2.3. Parse Json- Set field values

ReflectiveTypeAdapterFactory again. The read method of Adapter.

Gson uses JsonReader to read the key-value from json and calls boundField. read to set the read value.

After that, Gson simply calls JsonReader’s next series of methods to read the key-value of the JSON string, and then sets the value through BoundField into the corresponding field.

At this point, the process of setting the value ends.

4.2.4. Summary

Gson’s logic for parsing JSON doesn’t seem too complicated, but it’s broken down into three major steps:

  1. Get TypeAdapter;
  2. Reflection create instance;
  3. Reflection setting value.

When retrieving a TypeAdapter, Foo does not set the TypeAdapter in the Gson instance. When retrieving a TypeAdapter, Foo sets the TypeAdapter in the Gson instance. Will eventually get ReflectiveTypeAdapterFactory. Adapter this TypeAdapter; When creating ReflectiveTypeAdapterFactory. Adapter, need to reflect traversal classes and its superclass of all of the fields;

When reflection creates an instance, since the InstanceCreator is not set to the Gson instance, Gson internally reflects Foo’s no-argument constructor to create the instance (Foo has default values for all fields and will have a no-argument constructor when compiled to Java).

When reflection sets the value, the BoundField is set to this field by calling its read method, which was created in advance.

Ultimately, you just need to return this instance.

5. Time point

As can be seen from the above, there are performance bottlenecks when naked Gson parses JSON strings (in order of impact, from largest to smallest) :

  1. The default TypeAdapter reflects all fields of a class and its superclass and generates a Map table;
  2. The default TypeAdapter uses reflection in setValue;
  3. The default TypeAdapter uses reflection to create instances.

The fire chart confirms my conclusion.

6. Solutions

6.1. Handwritten TypeAdapter

Writing TypeAdapter is a repetitive manual task. There are two main parts, the toJson part (corresponding to the write method) and the fromJson part (corresponding to the read method). Only the anti-FromJSON part will be discussed here, as in Foo above.

This is a handwritten TypeAdapter for json string fromJson, divided into three steps:

  1. Define temporary variables first;
  2. Constantly reading key-values from json strings;
  3. Assemble all temporary variables into an instance.

As you can see, I wrote a lot of code just to read four fields and generate an instance, and this code doesn’t take into account the exception scenario, which is very cumbersome.

6.2. Generate code using Kapt

It is not hard to see that the above handwritten code has certain patterns, and you can use code generation techniques to generate TypeAdapter.

There are no more than three steps to generating the read method

  1. Generate temporary variables to hold the values read;
  2. Generate a while-when expression and call it over and over againJsonReaderThe key-value method reads the key-value and stores it in these temporary variables;
  3. Assembles all temporary variables into return types and returns.

Note that the second step involves abstracting an interface for the different types, with each implementation specifically reading data for each type.

But Kapt has its limitations:

  1. The kapt phase code has been compiled into Java code, and many of the Kotlin features are missing;
  2. There is an intrusion into the compile phase, and the more classes you have to deal with, the more time it takes.

The first limitation is to read Kotlin metadata and use KSP directly.

The second limitation is that Kapt is a compile-time tool, destined to invade compile-time.

6.3. Write IDEA Plugin

The IDEA Plugin directly addresses the second limitation of Kapt: compile-time intrusion; In addition, IDEA provides a PSI (Project Structure Interface), which encapsulates the Abstract Syntax Tree (AST). Using PSI, you can statically analyze code as well as generate code.

But the IDEA Plugin has limitations:

Because it is the parse code syntax tree, different language syntax tree is different, so it should be adapted separately.

7. To summarize

Gson’s default fromJson logic is time-consuming because it uses a lot of reflection.

So we can rewrite a non-reflective logic, called a TypeAdapter, to speed things up.

There are different schemes for writing TypeAdapter. Each scheme has its advantages and disadvantages, and can be used as required.