preface

Usually when we use Java serialization and deserialization, we just implement the Serializable interface of the class and leave the rest to the JDK. Today we’re going to take a look at how Java serialization is implemented, and then look at a few common collection classes that handle serialization problems.

The analysis process

A few questions to ponder

  1. Why is only implementation required when serializing an objectSerializableInterfaces are fine.
  2. When we serialize a class, why is it recommended to implement a static onefinalMember variablesserialVersionUID.
  3. How is serialization ignoredtransientKeyword, static variables are not serialized.

Let’s take the question and find the answer in the source code.

Serializable

Serializable interface, the source is very simple, an empty interface, no methods and no member variables. However, the comments are very detailed and clearly describe how Serializable can be used and what it can do. It is worth reading.

/** * Serializability of a class is enabled by the class implementing the * java.io.Serializable interface. Classes that  do not implement this * interface will not have any of their state serialized or * deserialized. All subtypes of a serializable class are themselves * serializable. The serialization interface has no methods or fields * and serves only  to identify the semantics of being serializable. */
Copy the code

Serializable classes are enabled by implementing the java.io.Serializable interface. Classes that do not implement the serialization interface cannot be serialized, and all subclasses that do can be serialized. The Serializable interface has no methods or attributes, just a flag that identifies that a class can be serialized.

/** * Classes that require special handling during the serialization and * deserialization process must implement special methods with these exact * signatures: * * 
 * private void writeObject(java.io.ObjectOutputStream out) * throws IOException * private void readObject(java.io.ObjectInputStream in) * throws IOException, ClassNotFoundException; * private void readObjectNoData() * throws ObjectStreamException; * 

*/
Copy the code

If a class wants to do something special during serialization, it can implement the following methods: writeObject(), readObject(), readObjectNoData(), where,

  • The writeObject method is responsible for writing the state of the object for its particular class so that the correspondingreadObject()Method can restore it.
  • readObject()Method is responsible for reading and restoring class fields from the stream.
  • What if a superclass doesn’t support serialization, but you don’t want to use the default values?writeReplace()Method allows an object to replace itself with an object before it is written to the stream.
  • readResolve()Typically used in a singleton pattern, objects can be read out of a stream and replaced with one object by another.

ObjectOutputStream

    // The implementation of the methods we want to serialize objects is usually in this function
    public final void writeObject(Object obj) throws IOException {...try {
            // Write the concrete implementation method
            writeObject0(obj, false);
        } catch (IOException ex) {
            ...
            throwex; }}private void writeObject0(Object obj, boolean unshared) throws IOException {... Omit Object orig = obj; Class<? > cl = obj.getClass(); ObjectStreamClass desc;for (;;) {
                // REMIND: skip this check for strings/arrays?Class<? > repCl;// Get ObjectStreamClass, which is very important
                // The function that gets the class attributes is called when its constructor initializes
                // The getDefaultSerialFields method is eventually called
                // Filter out a transient or static attribute of the class through flag.
                desc = ObjectStreamClass.lookup(cl, true);
                if(! desc.hasWriteReplaceMethod() || (obj = desc.invokeWriteReplace(obj)) ==null ||
                    (repCl = obj.getClass()) == cl)
                {
                    break;
                }
                cl = repCl;
        }
            
        // The main write logic is as follows
        //String, Array, Enum handles serialization itself
        if (obj instanceof String) {
            writeString((String) obj, unshared);
        } else if (cl.isArray()) {
            writeArray(obj, desc, unshared);
        } else if (obj instanceofEnum) { writeEnum((Enum<? >) obj, desc, unshared);// The main point here is whether the object is' Serializable 'via' instanceof '
            // This is just a normal self-defined class without implementing 'Serializable'
            // Why an exception is thrown during serialization (explains problem 1)
        } else if (obj instanceof Serializable) {
            writeOrdinaryObject(obj, desc, unshared);
        } else {
            if (extendedDebugInfo) {
                throw new NotSerializableException(
                    cl.getName() + "\n" + debugInfoStack.toString());
            } else {
                throw newNotSerializableException(cl.getName()); }}... }private void writeOrdinaryObject(Object obj,
                                     ObjectStreamClass desc,
                                     boolean unshared)
        throws IOException
    {...try {
            desc.checkSerialize();
            
            // Write the binary file with the magic number 0x73 at the beginning of the normal object
            bout.writeByte(TC_OBJECT);
            // Write the corresponding class descriptor, see source code below
            writeClassDesc(desc, false);
            
            handles.assign(unshared ? null : obj);
            if(desc.isExternalizable() && ! desc.isProxy()) { writeExternalData((Externalizable) obj); }else{ writeSerialData(obj, desc); }}finally {
            if(extendedDebugInfo) { debugInfoStack.pop(); }}}private void writeClassDesc(ObjectStreamClass desc, boolean unshared)
        throws IOException
    {
        / / handle
        int handle;
        / / null description
        if (desc == null) {
            writeNull();
            // Class object reference handle
            // If a handle already exists in the stream, use it directly to improve serialization efficiency
        } else if(! unshared && (handle = handles.lookup(desc)) ! = -1) {
            writeHandle(handle);
            // Dynamic proxy class descriptor
        } else if (desc.isProxy()) {
            writeProxyDesc(desc, unshared);
            // Plain class descriptor
        } else {
            // This calls desc.writenonProxy (this) as followswriteNonProxyDesc(desc, unshared); }}void writeNonProxy(ObjectOutputStream out) throws IOException {
        out.writeUTF(name);
        / / write serialVersionUIDout.writeLong(getSerialVersionUID()); . }public long getSerialVersionUID(a) {
        // If serialVersionUID is not defined
        // The serialization mechanism calls a function that computes a hash value based on attributes and so on inside the class
        // This is why it is not recommended not to define serialVersionUID when serializing
        // Because the hash value changes based on class changes
        // If you add an attribute, the previously serialized binaries will not be deserialized and Java will throw an exception
        // (explains question 2)
        if (suid == null) {
            suid = AccessController.doPrivileged(
                new PrivilegedAction<Long>() {
                    public Long run(a) {
                        returncomputeDefaultSUID(cl); }}); }// SerialVersionUID is defined
        return suid.longValue();
    }

    // At this point, I want to insert a little personal insight into serialized binaries, see below
Copy the code

A bit of reading of serialized binaries

If we want to serialize a List where PhoneItem looks like this,

class PhoneItem implements Serializable {
    String phoneNumber;
}
Copy the code

The code to construct the List is omitted. Assuming we serialize a List of size 5, the binary looks something like this.

7372 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
7371 xxxx xxxx 
Copy the code

According to the source code interpretation, the magic number 0x73 at the beginning represents a common object, 72 represents the class descriptor, and 71 represents the class descriptor as a reference type. As you can see, binary files are converted to Java objects by matching the beginning of a magic number when parsing them. During serialization, if the same object is already present in the stream, subsequent serialization can directly obtain the handle of the object and change it to the reference type, thus improving the serialization efficiency.

WriteSerialData () {writeObject () {writeObject (); Private void defaultWriteFields(Object obj, ObjectStreamClass desc) throws IOException {... int primDataSize = desc.getPrimDataSize();if(primVals == null || primVals.length < primDataSize) { primVals = new byte[primDataSize]; } desc.getPrimFieldValues(obj, primVals); Bout. Write (primVals, 0, primDataSize,false);

        ObjectStreamField[] fields = desc.getFields(false);
        Object[] objVals = new Object[desc.getNumObjFields()];
        int numPrimFields = fields.length - objVals.length;
        desc.getObjFieldValues(obj, objVals);
        for(int i = 0; i < objVals.length; i++) { ... WriteObject0 (objprimfields [I], fields[numPrimFields + I].isunshared ()); writeObject0(objprimfields [I], fields[numPrimFields + I]. } finally {if(extendedDebugInfo) { debugInfoStack.pop(); }}}}Copy the code

Since the deserialization process is similar to the serialization process, it will not be described here.

Common serialization problems with collection classes

HashMap

Java requires that the deserialized object be the same as the object before it was serialized, but because the key of a HashMap is computed by hash. The values calculated after deserialization may be inconsistent (deserialization is performed in different JVM environments). So HashMap needs to rewrite the serialization implementation to avoid such inconsistencies.

To do this, define the attribute to be customized as TRANSIENT and write writeObject to perform special processing

    private void writeObject(java.io.ObjectOutputStream s)
        throws IOException {
        int buckets = capacity();
        // Write out the threshold, loadfactor, and any hidden stuff
        s.defaultWriteObject();
        // Write hash bucket capacity
        s.writeInt(buckets);
        // Write the size of k-v
        s.writeInt(size);
        // iterate over writing non-null k-v
        internalWriteEntries(s);
    }
Copy the code

ArrayList

Because arrays in an ArrayList are almost always larger than the actual number of elements, writeObject and readObject are overridden to avoid serializing arrays with no elements

    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{... s.defaultWriteObject();// Write the current size of arrayList
        s.writeInt(size);

        // Write elements in the same order
        for (int i=0; i<size; i++) { s.writeObject(elementData[i]); }... }Copy the code