preface
Json serialization framework of security vulnerabilities have been programmers in poking fun at a topic, especially these two years fastjson caused by targeted research, is frequently quoted in the land of loopholes, out of a hole it doesn’t matter, can security team always drives online application to depend on the upgrade by email, this is killing me, I’m sure many of you have considered replacing FastJSON with another serialization framework. Recently, we had a project to replace FastJSON with gSON, which caused a problem online. Share this experience, so as not to step on the same pit, in this warning everyone, standard ten million, safety first, upgrade is not standard, online two lines of tears.
Problem description
A very simple logic on the line serializes the object into FastJSON and sends the string out using an HTTP request. It was working fine, but after replacing fastJSON with gson, it actually caused OOM online. After memory dump analysis, it was found that a packet with the size of 400 M+ was sent. Because the HTTP tool did not check the size of the packet, the packet was forcibly transmitted. As a result, online services were unavailable.
Problem analysis
Why is it that the fastJSON serialization does not cause problems, but the gSON serialization is immediately exposed? By analyzing the data of memory dump, it was found that the values of many fields were repeated. Combined with the characteristics of our business data, the problem was identified at once — the serious defect of gSON serialized duplicate objects.
Let’s go straight to a simple example to illustrate the problem. To simulate the data features on the line, add the same reference object using List
Foo foo = new Foo();
Bar bar = new Bar();
List<Foo> foos = new ArrayList<>();
for(int i=0; i<3; i++){ foos.add(foo); } bar.setFoos(foos); Gson gson =new Gson();
String gsonStr = gson.toJson(bar);
System.out.println(gsonStr);
String fastjsonStr = JSON.toJSONString(bar);
System.out.println(fastjsonStr);
Copy the code
Observe the print:
Gson:
{"foos": [{"a":"aaaaa"}, {"a":"aaaaa"}, {"a":"aaaaa"}}]Copy the code
Fastjson:
{"foos": [{"a":"aaaaa"}, {"$ref":"$.foos[0]"}, {"$ref":"$.foos[0]"}}]Copy the code
You can see that GSON handles duplicate objects by serializing each object, while FastJSON handles duplicate objects by marking all objects except the first one with the reference symbol $ref.
When the number of single duplicate objects is very large and the number of single object commits is large, the two different serialization strategies can lead to a qualitative change. Let’s compare this for a particular scenario.
Compression ratio test
-
Serialized object: Contains a large number of properties. To simulate online business data.
-
Repetitions: 200. That is, the List contains 200 objects with the same reference, to simulate the complex object structure on the line and expand the difference.
-
Serialization mode: GSON, FastJSON, Java, Hessian2. The addition of Java and Hessian2 controls allows us to see how each serialization framework performs in this particular scenario.
-
The main observation of the serialization of the size of the compressed bytes, because it relates to the size of the network transmission; The secondary observation is to see if the same object is still in the List after the antisequence
public class Main {
public static void main(String[] args) throws IOException, ClassNotFoundException {
Foo foo = new Foo();
Bar bar = new Bar();
List<Foo> foos = new ArrayList<>();
for(int i=0; i<200; i++){ foos.add(foo); } bar.setFoos(foos);// gson
Gson gson = new Gson();
String gsonStr = gson.toJson(bar);
System.out.println(gsonStr.length());
Bar gsonBar = gson.fromJson(fastjsonStr, Bar.class);
System.out.println(gsonBar.getFoos().get(0) == gsonBar.getFoos().get(1));
// fastjson
String fastjsonStr = JSON.toJSONString(bar);
System.out.println(fastjsonStr.length());
Bar fastjsonBar = JSON.parseObject(fastjsonStr, Bar.class);
System.out.println(fastjsonBar.getFoos().get(0) == fastjsonBar.getFoos().get(1));
// java
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
oos.writeObject(bar);
oos.close();
System.out.println(byteArrayOutputStream.toByteArray().length);
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
Bar javaBar = (Bar) ois.readObject();
ois.close();
System.out.println(javaBar.getFoos().get(0) == javaBar.getFoos().get(1));
// hessian2
ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
hessian2Output.writeObject(bar);
hessian2Output.close();
System.out.println(hessian2Baos.toByteArray().length);
ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
Bar hessian2Bar = (Bar) hessian2Input.readObject();
hessian2Input.close();
System.out.println(hessian2Bar.getFoos().get(0) == hessian2Bar.getFoos().get(1)); }}Copy the code
Output results:
gson:
62810
false
fastjson:
4503
true
Java:
1540
true
Hessian2:
686
true
Copy the code
Conclusion analysis: Since the volume of a single object after serialization is large, reference representation can be used to reduce the volume. It can be found that GSON does not adopt this serialization optimization strategy, resulting in volume expansion. Even the consistently underrated Java serialization is much better, and Hessian2 is even better, optimizing directly over GSON by two orders of magnitude. Also, after deserialization, GSON cannot restore the object that was originally the same reference, as any other serialization framework can.
Throughput test
In addition to the size of the data after serialization, the throughput of each serialization is also a point of concern. Benchmarking allows you to accurately test the throughput of each serialization.
@BenchmarkMode({Mode.Throughput})
@State(Scope.Benchmark)
public class MicroBenchmark {
private Bar bar;
@Setup
public void prepare(a) {
Foo foo = new Foo();
Bar bar = new Bar();
List<Foo> foos = new ArrayList<>();
for(int i=0; i<200; i++){ foos.add(foo); } bar.setFoos(foos); } Gson gson =new Gson();
@Benchmark
public void gson(a){
String gsonStr = gson.toJson(bar);
gson.fromJson(gsonStr, Bar.class);
}
@Benchmark
public void fastjson(a){
String fastjsonStr = JSON.toJSONString(bar);
JSON.parseObject(fastjsonStr, Bar.class);
}
@Benchmark
public void java(a) throws Exception {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
oos.writeObject(bar);
oos.close();
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
Bar javaBar = (Bar) ois.readObject();
ois.close();
}
@Benchmark
public void hessian2(a) throws Exception {
ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
hessian2Output.writeObject(bar);
hessian2Output.close();
ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
Bar hessian2Bar = (Bar) hessian2Input.readObject();
hessian2Input.close();
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(MicroBenchmark.class.getSimpleName())
.build();
newRunner(opt).run(); }}Copy the code
Throughput report:
Benchmark Mode Cnt Score Error Units MicroBenchmark. Fastjson THRPT 25 6724809.416 ± 1542197.448 ops/s Gson THRPT 25 1508825.440 ± 194148.657 ops/s MicroBenchmark. Hessian2 THRPT 25 758643.567 ± 239754.709 Java THRPT 25 734624.615 ± 66892.728 ops/sCopy the code
Not surprisingly, FastJSON comes out on top, with text class serialization throughput an order of magnitude higher than binary serialization, at megaflops and 100,000aflops, respectively.
Overall test conclusion
- A fastJSON serialized reference tag with $can also be deserialized correctly by Gson, but I did not find a configuration that would convert gSON to a reference when serialized
- Fastjson, Hessian, and Java all support circular reference resolution. Gson does not support
- Fastjson can set DisableCircularReferenceDetect, closed circular references and a reference to a repeat of detection
- Objects with the same reference before the gSON deserialization are not considered the same after being serialized and then deserialized back, which may cause the number of memory objects to be inflated. However, fastJSON, Java, Hessian2 and other serialization methods do not have this problem because they record reference tags
- Taking the author’s test case as an example, Hessian2 has a very powerful serialization compression ratio, which is suitable for the scenario where large packets are serialized for network transmission
- Take the author’s test case as an example, FastJSON has a very high throughput, worthy of its FAST, suitable for high throughput scenarios
- Serialization also takes into account support for circular references, support for circular object optimization, support for synthetic scenarios such as enumerated types, collections, arrays, subclasses, multistate, inner classes, generics, support for comparison scenarios such as visualization, compatibility after adding or deleting fields, and so on. In summary, I recommend hessian2 and FastJSON serialization methods
conclusion
We all know that FastJSON does some hack logic in order to be fast, which also leads to a lot of loopholes, but I think the coding is carried out in the trade off, if there is a perfect framework, then other competing frameworks would not exist. I haven’t done much research into the various serialization frameworks, and while you might say that Jackson is better, I can only say that it solves the problem that your scenario has, and that’s the right framework.
Finally, be careful when replacing a serialization framework. Understand the features of the replacement framework. The new framework may not cover as well as the original framework solves.