As far as I know, Java developers think of strings almost anytime, and strings have indeed become the most commonly used class, and used a lot. As we all know, strings encapsulate characters and must be stored in characters or byte arrays. Since Java9, Java language developers have made some spatial optimizations for strings.

From char to byte

Implementations of the String class in libraries prior to JDK9 used arrays of char to hold strings. Char takes up 16 bits, or two bytes.

private final char value[]; In this case, if we want to store character A, it is 0x00 0x41, at which point the first byte of space is wasted. However, if you save Chinese characters, there is no waste, that is to say, if you save isO-8859-1 encoding characters are wasted, the characters outside the code will not be wasted.

The implementation of String class after JDK9 uses byte arrays to store strings, each byte occupies 8 bits, that is, 1 byte.

Private Final Byte [] value Specifies the value

String supports multiple encodings, but if no encoding is specified, it may use two encodings, LATIN1 and UTF16. LATIN1 may be unfamiliar, but it is actually isO-8859-1 encoding, which belongs to single-byte encoding. UTF16 is double-byte encoding and uses one or two 16-bit Spaces for storage.

Compression space

Compressed character objects are mainly characters in ISO-8859-1 encoding, such as English alphanumeric and other common symbols. To understand this, let’s take a look at the figure below. If we had a “what” string, it would have been wasted if, prior to Java9, it had been stored in a queue like this. You can see that each character needed 16 bits to store, and the high byte bits were 0.

After Java9, its storage arrangement is much more compact, as shown below, with only four bytes.

However, if it is “ha A”, the layout is as follows, so if the character in the string contains a character that is not in the ISO-8859-1 encoding, the same 16-bit length is used.

Java9 String defaults to using the above compact spatial layout, as shown in the following code, which sets COMPACT_STRINGS to true by default. To cancel the compact layout, you can configure the VM parameter -xx: -compactstrings.

static final boolean COMPACT_STRINGS; static { COMPACT_STRINGS = true; } String length

Because we changed the implementation of String to use UTF-16 or Latin-1 encoding, we need an internal identifier coder to indicate which encoding is used, with LATIN1 value 0 and UTF16 value 1.

private final byte coder; static final byte LATIN1 = 0; static final byte UTF16 = 1; And the length of the string is also related to the encoding, which is calculated by right shift. If latin-1 is encoded, move it 0 bits to the right and the array length is the length of the string. If it is UTF16, move it 1 bit to the right, and half the array length is the string length.

public int length() { return value.length >> coder(); } to summarize

String objects are a heavily used object in Java, and we can easily use them in large quantities without considering the cost, so space optimization is necessary, and Java9 is starting to help us reduce the amount of space that strings take up in the heap, as well as reduce GC stress. At the same time, we can also see that this spatial optimization is not meaningful for Chinese.