preface

In Java, strings are limited in length, so you might ask yourself, “Is there a limit on String length?” Yes, there are specifications in JVM compilations, and some families have encountered them in job interviews.

I just met the interview to ask this, but also truly met in before the development of the String length limit setting (in the form of a fixed file transcoding into Base64 String is stored, at runtime when needed in the back, when the file is larger), what on earth is that the specification limits. Let’s have a look without saying a word. About the Java project also organized 100+Java project video + source + notes, address: 100+Java project video + source + notes

String

The first thing we need to know about String length is how String stores strings. String actually uses an array of char types to store the characters in the String.

So String is an array is there a limit on the length of an array? Yes, it’s limited, but it’s conditional, so let’s look at the method that returns length in String.

We can specify the length of an array. If we do not specify the length of an array, we will specify the length of the array by default.

int[] arr1 = new int[10]; Int [] arr2 = {1,2,3,4,5}; // The length of the array is 5Copy the code

(2^31-1 = 2147483647 = 4GB) (2^31-1 = 2147483647 = 4GB) (2^31-1 = 2147483647 = 4GB) (2^31-1 = 2147483647 = 4GB)

With that in mind, let’s try to verify this with code.

The string is too long. We can save as many as 2.1 billion characters. Why is it wrong for 100,000?

There is a limitation in the JVM compilation specification that if a String is defined as a literal, the JVM will store it in the constant pool at compile time, and the JVM will limit that constant pool to a String.

In the constant pool, each cp_INFO entry must have the same format, starting with a single-byte “tag” entry that represents the cp_INFO type. The contents of the following info[] items are determined by the type of tag.

We can see that the representation of String is CONSTANT_String. Let’s see how CONSTANT_String is defined.

The u2 string_index defined here represents the valid index of the constant pool, and is of type CONSTANT_Utf8_info. The only thing we need to note here is the length defined here.

In the class file u2 represents the unsigned number in 2 byte units, we know that 1 byte is 8 bits, 2 bytes is 16 bits, so the range of 2 words energy saving is 2^ 16-1 = 65535. The definition of U1 and u2 is explained in the following abstract:

# Here is the summary of the Java virtual machine

##1. Explain the content type of the class file

Define a set of private data types to represent the contents of a Class file, including U1, U2, and U4, representing 1, 2, and 4 bytes of unsigned numbers, respectively.

Each Class file is made up of 8-byte byte streams, and all 16-bit, 32-bit, and 64-bit length data will be constructed into 2, 4, and 8 8-byte units to represent.

##2. Explain the effective scope of program exception handling

The values of start_PC and end_PC indicate the valid range of exception handlers in the code[] array.

Start_pc must be a valid index of the opcode for an instruction in the current code[] array, and end_pc must be a valid index of the opcode for an instruction in the current code[] array, or equal to the code_length value, which is the length of the current code[] array. The value of start_pc must be smaller than that of end_pc.

The exception handler takes effect when the program counter is in the range [start_PC, end_PC]. That is, let x be the value in the valid range of the exception handle, where x is: start_PC ≤ x < end_pc.

In fact, the fact that the end_PC value itself is not within the valid range of the exception handler is a design flaw in the history of the Java virtual machine: If the code property of a method in the Java virtual machine is exactly 65535 bytes long and ends with an instruction of 1 byte length, the instruction will not be processed by the exception handler.

However, the compiler can indirectly compensate for this BUG by limiting the maximum length of the code[] array for any method, instance initializer, or class initializer to 65534.

Note: Here I mark the points that I think are more important. First of all, the first bold says that the effective range of array is [0-65565], but the second bold explains that the virtual machine needs 1 byte of instruction as the end, so the real effective range is [0-65564]. Note that this range is compile-time only, and you can go beyond it if you are concatenating strings at runtime.

Let’s do a little experiment to test if we can build a string of 65534 lengths and see if it compiles. 0 phase summary

Firstly, a for loop is used to construct 65534 length strings. After printing on the console, we calculate 65534 characters through an online character statistics tool of Baidu, as follows:

Then we copy the characters and assign them to the string in the form of defining literals. You can see that when we select these characters, the lower right corner shows 65534, so we run a wave and it succeeds.

# See here we can sum up:

Is there a length limit on ## strings? How much is?

The contents of a String are stored by an array char[]. Since the length and index of the array are integers, and the String method length() returns an int, So by looking at the Java source class Integer, we can see that the maximum range of Integer is 2^ 31-1. Since the array starts from 0, the maximum length of the array can be [0~2^31], which is about 4GB.

But by looking through the Java Virtual Machine manual for class file format definition and constant pool for String structure definition we can know for index definition u2, is unsigned for 2 bytes, 2 bytes can represent the maximum range is 2^ 16-1 = 65535.

It’s actually 65535, but since the JVM needs 1 byte to represent the closing instruction, the range is 65534. Anything beyond this range will cause an error at compile time, but the range for run-time concatenation or assignment is within the maximum range of the integer.