incentive

Recent tests show that the production environment of a project can make the interface report errors through a certain operation process. However, some interfaces are available and some are unavailable.

screening

The following investigation process truthfully records the thoughts and thoughts of the investigation at that time, and may take many detours. But in the spirit of bold assumptions, careful verification of the attitude, step by step close to the root of the problem. While the problem has been resolved, the source of the problem remains unclear. This article is intended to record the process and thinking of the investigation.

Since this is a production environment, forgive me for using thick mosaics to erase sensitive information.

Is it string out of bounds?

Don’t panic if you encounter a problem, first go to the environment to see what happens in the log:

java.lang.StringIndexOutOfBoundsException: String index out of range: -2089467602
	at java.lang.String.getChars(String.java:824)
	at com.alibaba.fastjson.serializer.SerializeWriter.writeFieldValueStringWithDoubleQuoteCheck(SerializeWriter.java:1906)
	at com.alibaba.fastjson.serializer.ASMSerializer_18_UserLoginDTO.write(Unknown Source)
	at com.alibaba.fastjson.serializer.ASMSerializer_10_UserCacheDTO.write(Unknown Source)
	at com.alibaba.fastjson.serializer.JSONSerializer.write(JSONSerializer.java:285)
	at com.alibaba.fastjson.JSON.toJSONString(JSON.java:745)
	at com.alibaba.fastjson.JSON.toJSONString(JSON.java:683)
	at com.alibaba.fastjson.JSON.toJSONString(JSON.java:648)

Copy the code

Well, this is a serialized Json string error.

But a closer look, found that things are not so simple, why is the boundary is a negative number -2089467602?

Look at the Java. Lang. String. GetChars (824). The String Java: (note: the environment for JDK8) :

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
        if (srcBegin < 0) {
            throw new StringIndexOutOfBoundsException(srcBegin);
        }
        if (srcEnd > value.length) {
            throw new StringIndexOutOfBoundsException(srcEnd);
        }
        if (srcBegin > srcEnd) {
            throw new StringIndexOutOfBoundsException(srcEnd - srcBegin); / / 824 rows
        }
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }
Copy the code

The fault line by 824 throw new StringIndexOutOfBoundsException (srcEnd – srcBegin) throw, srcEnd – srcBegin = – 2089467602. Great reduced energies, either srcEnd are negative values; Either srcBegin was a very small negative number, srcEnd was a very big positive number. This depends on what value is passed in when the method is called.

When calling this method at the previous level, Com. Alibaba. Fastjson. Serializer. SerializeWriter. WriteFieldValueStringWithDoubleQuoteCheck (SerializeWriter. Java: 1906) and all the items (Note: Fastjson version is 1.2.67, different versions vary) :

    public void writeFieldValueStringWithDoubleQuoteCheck(char seperator, String name, String value) {
        int nameLen = name.length();
        int valueLen;

        int newcount = count;

        if (value == null) {
            valueLen = 4;
            newcount += nameLen + 8;
        } else {
            valueLen = value.length();
            newcount += nameLen + valueLen + 6;
        }
        // omit extraneous code
        value.getChars(0, valueLen, buf, valueStart); / / 1906 rows
        // omit extraneous code
    }
Copy the code

I was a little surprised to see Text-.getchars, srcBegin=0, srcEnd=valueLen=value string length. This is… How does this figure out the negative number ???? It’s not scientific! I want to see what the object in this text looks like! But… But this is a production environment, not as convenient as local debugging.

At this time, let us invite our good brother: Arthas!

I’m sorry, not this Arthas.

Is this:

Alibaba open source Java diagnostic tools, address: github.com/alibaba/art… . It can: look at the running state, look at JVM parameters, look at method call chains, look at observing method inputs and return results, and so on. See their documentation for details: arthas.aliyun.com/doc/.

The following screening process will continue to use this tool to assist in the verification of guesses. I won’t go into too much detail about commands, as there are official documentation examples and online tutorials.

Download arthas and copy it to run in production.

Jar package: java-jar arthas-boot.jar

Then select the process and press Enter, arthas completes the enhancement. At this point you can use arthas’s functionality.

Arthas principle: www.jianshu.com/p/4e34d0ab4…

Back to just screening process, now we are going to see writeFieldValueStringWithDoubleQuoteCheck (char seperator, String name, String value) into the reference value of what looks like.

Using the command watch com. Alibaba. Fastjson. Serialize. SerializeWriter writeStringWithSingleQuote “{params, throwExp}” view method into the reference:

You can see that the @object [] line starts with the order of input arguments from top to bottom. The classes are Character, String, and UserImVO.

And so on… Why can a String entry pass another object?

At this point I began to wonder if there was a fastjson bug that caused the generated ASMSerializer_18_UserLoginDTO class to pass the wrong object into the method.

To verify the conjecture, see what the object of ASmSerializer_18_userLogindto.write is:

This…… The id in the write object is of type UserImVO, which is very different from what we expect from String.

Is it FastJson?

	at com.alibaba.fastjson.serializer.SerializeWriter.writeFieldValueStringWithDoubleQuoteCheck(SerializeWriter.java:1906)
	at com.alibaba.fastjson.serializer.ASMSerializer_18_UserLoginDTO.write(Unknown Source)
	at com.alibaba.fastjson.serializer.ASMSerializer_10_UserCacheDTO.write(Unknown Source)
	at com.alibaba.fastjson.serializer.JSONSerializer.write(JSONSerializer.java:285)

Copy the code

At this point, think back to the beginning of the problem. Focus on the middle two classes generated by Fastjson. Com. Alibaba. Fastjson. Serializer. ASMSerializer_18_UserLoginDTO and com. Alibaba. Fastjson. Serializer. ASMSerializer_10_UserCache DTO

Episodic: A quick note on how FastJSON (1.2.67 as an example) generates JSON strings from objects.

  1. We call json.tojsonString (Object)
  2. Generates JSONSerializer and initializes SerializeConfig
  3. SerializeConfig holds ASMSerializerFactory and creates SerializeBeanInfo. The former generates Serializer for the object, and the latter iterates through the object to obtain the object information
  4. ASMSerializerFactory createJavaBeanSerializer according to SerializeBeanInfo, create a ASMSerializer_ serial number _ short class name of the class, use the ASM generated binary bytecodes.
  5. Generate bytecode used after class loader ASMClassLoader. DefineClassPublic loading the class to the JVM
  6. Use this class to output a JSON string

So how do we see the runtime class information in production? Yes, arthas again.

Using the command dump com. Alibaba. Fastjson. Serializer. ASMSerializer_18_UserLoginDTO generates a class to hard disk. I’m just going to save both classes.

Open theASMSerializer_18_UserLoginDTOAnd find the code for printing the ID field:

var6 = "id";  / / the name of the field
var13 = var10.getId(); // var10 is UserLoginDTO
if (var13 == null) { // Don't go this way
    if (var9.isEnabled(132)) {
        var9.write(var12);
        var9.writeFieldNameDirect(var6);
        var9.writeNull(0.128);
        var12 = ', '; }}else {
    // Not null, go here. Here var12 is the string comma,var6 is name, and var13 is value
    var9.writeFieldValueStringWithDoubleQuoteCheck(var12, var6, var13);
    var12 = ', ';
}
Copy the code

This code is excerpted from the code for the ID field generated by asmSerializER_18_userLogindto.class. As you can see, Fastjson actually gets the ID object in userLoginDTO without doing anything else. Obviously, the id in this object is no longer a string before fastjson is passed in.

I then followed the business code (which I won’t show here) to the source of UserLoginDTO – from Redis.

This project uses JetCache to encapsulate operations on redis data.

Here’s jetcache again (github.com/alibaba/jet…) : JetCache is a Java cache abstraction which provides consistent use for various Caching solutions.

In the same way, use Watch to check the object returned by jetCache, and find that the class with the id is UserImVO. It seems that the data was not correct when it was retrieved from Redis.

I looked at the jetCache configuration and found that the serialization deserializer was configured as Kryo.

Kryo (github.com/EsotericSof…). : Kryo is a fast and efficient Binary Object Graph Serialization framework for Java.

Is it Kryo?

Take a look at the value returned by JetCache when deserializing kryo:

Here we are looking at jetCache retrieving Byte [] from Redis and deserializing byte[] into the desired object using the wrapped Kryo deserialization class. It is clear from the figure that the data obtained from Redis is already abnormal when deserialized into objects.

Kryo’s pot?

I retrieved the data from the logs that generated the JSON string correctly and ran the same version of Kryo serialization and deserialization and found that the problem could not be replicated.

Then sorted out the above library and version information that I suspected.

library The interface that reported an error An error-free interface
fastjson 1.2.67 1.2.70
jetcache Server. The M2 Server. The M2
kryo 4.0.2 4.0.2
Business code That’s a three-month difference That’s a three-month difference
  1. For Fastjson, I looked at changelog for 68, 69, and 70 and found no fixes. But related methods likewriteFieldValueStringWithDoubleQuoteCheckIt’s a big change.
  2. Jetcache and Kryo are identical versions, and the same objects didn’t have a problem when I serialized them locally.

So in the end it was the business code?

Because both the error-reporting interface and the non-reporting interface call this code, the non-reporting interface is updated three months later than the error-reporting interface. I looked at the code update history and saw that UserLoginDTO had added several fields. Does adding a few fields affect the deserialization of Kryo?

conclusion

This issue was resolved when the business code was subsequently updated. The solution is a bit crude, but the question remains unclear. My guess remains with Kryo.

reference

  1. Arthas: document arthas.aliyun.com/doc/en/