String is an important data type in Java. After the basic data type, String is the most widely used data type. However, there are a lot of things about String that are easy to overlook.

As we will discuss in this article: Is there a length limit on strings in Java?

This problem can be looked at in two phases: compile time and run time. Different periods have different limits.

Compile time

First, let’s make a reasonable inference that when we use String s = “” in code; Is there a limit to the number of characters in “”?

Public String(char value[], int offset, int count) public String(char value[], int offset, int count) Char value[] can hold a maximum of Integer. Max_values, that is, 2147483647 characters. (jdk1.8.0 _73)

String s = “”; Can contain a maximum of 65534 characters. If you go beyond that. An error will be reported at compile time.

public static void main(String[] args) { String s = "a... a"; A system.out.println (s.length())); String s1 = "a... a"; A system.out.println (s1.length()); }Copy the code

String s1 = “a… a”; // failed to compile at 65,535 a:

Qualify Javac StringLenghdemo.java stringLenghdemo.java :11: Error: The constant string is too longCopy the code

The specified length limit is 2147483647, why can’t 65535 characters compile?

When we define strings directly using String literals, we store a copy of the String in the constant pool. So 65534 mentioned above is actually a constant pool limitation.

Each data item in the constant pool also has its own type. Unicode strings encoded in UTF-8 in Java are represented as type CONSTANT_Utf8 in the constant pool.

CONSTANTUtf8info is a CONSTANTUtf8 constant pool data entry that stores a constant string. Almost all literals in the constant pool are described via CONSTANTUtf8info. CONSTANTUtf8_info is defined as follows:

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}
Copy the code

Since the focus of this article is not on CONSTANTUtf8info, we will not go into detail here. We only need to use the literal definition of the string in the class file, which is stored using CONSTANTUtf8info, and the u2 length in the CONSTANTUtf8info; Indicates the length of data stored for this type.

U2 is an unsigned 16-bit integer, so the maximum length theoretically allowed is 2^16=65536. Java Class files use a variant of UTF-8 format to store characters. Null values are represented by two bytes, leaving 65536-2 = 65534 bytes.

The Class File Format Spec makes this point clear:

The length of field and method names, field and method descriptors, and other constant string values is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANTUtf8info structure (§4.4.7). Note that the limit is on the number of bytes in the encoding and not on the number  of encoded characters. UTF-8 encodes some characters using two or three bytes. Thus, strings incorporating multibyte characters are further constrained.Copy the code

In Java, all data stored in the constant pool can be no longer than 65535 bytes, including string definitions.

The run-time

String s= “”; String s= “”; This is a limitation of the literal definition.

Then. MAX_VALUE is approximately 4 gigabytes. At runtime, if the String length exceeds this range, an exception may be thrown. (Prior to JDK 1.9)

Int is a 32-bit variable type, the longest they can have in positive numbers

2^31-1 =2147483647 16-bit Unicodecharacter 2147483647 * 16 = 34359738352 bits 34359738352/8 = 4294967294 (Byte) 4294967294 = 4194303.998046875 (KB) 4194303.998046875/1024 = 4095.9999980926513671875 (MB) 4095.9999980926513671875/1024 = 3.99999999813735485076904296875 (GB)Copy the code

It has nearly 4 gigabytes of capacity.