Like attention, no more lost, your support means a lot to me!
🔥 Hi, I’m Chouchou. GitHub · Android-Notebook has been included in this article. Welcome to grow up with Chouchou Peng. (Contact information at GitHub)
preface
- In every programming language, strings are an unavoidable topic and a common interview question;
- In this article, I will summarize the important knowledge of Java strings & interview questions. If you can help, please be sure to like and follow them. This is really important to me.
C language string and Java string comparison
- 1. Memory representation
C strings are essentially char[], which ends with \0. C-style strings automatically append \0 to the end of the array. C does not care about the encoding of the stored characters in char[], only the context of the program.
A Java String is an object that wraps an array of utF-16 BE-encoded characters (starting with Java 9 and becoming byte arrays). Any byte stream input from any other character encoding is converted to utF-16 BE encoding as it enters String.
- Char character length
In C, the char type is 1 byte, and can be either signed or unsigned.
In Java, the char type is 2 bytes and only the unsigned type is available.
language | type | Storage space (bytes) | The minimum value | The maximum |
---|---|---|---|---|
Java | char | 2 | 0 | 65535 |
C | Char (equivalent to signed char) | 1 | – 128. | 127 |
C | signed char | 1 | – 128. | 127 |
C | unsigned char | 1 | 0 | 255 |
Related in-depth article: “the NDK | C review notes”
Why is a char array changed to a byte array inside Java 9 String?
The memory representation of A Java String is essentially a character array based on UTF-16 BE encoding. Utf-16 is a variable-length encoding of two or four bytes, which means that even for Latin letters in the UniCode character set, only one byte is required using ASCII encoding, but two bytes of storage space is required in String.
To optimize storage, starting with Java 9, String determines whether the String contains only Latin letters. If so, use single-byte encoding (Latin-1), otherwise use UTF-16 encoding.
String.java (since Java 9)
private final byte coder;
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
byte coder() {
return COMPACT_STRINGS ? coder : UTF16;
}
@Native static final byte LATIN1 = 0;
@Native static final byte UTF16 = 1;
Copy the code
The simple differences between different encoding implementations are as follows:
Utf-32 (fixed length encoding with 4 bytes) UTF-16 (variable length encoding with 2 bytes or 4 bytes) UTF-8 (variable length encoding with 1 to 4 bytes)
Relevant in-depth articles: the computer composition principle | Unicode utf-8 and what is the relationship?”
Why is String an immutable class
Mp.weixin.qq.com/s?__biz=MzI…
StringBuilder and StringBuffer
- Operational efficiency
String is immutable and new variables are created for each operation, while the other two are mutable and do not need to create new variables; In addition, each operation of StringBuffer uses the synchronized keyword to ensure thread-safety, increasing the amount of time it takes to lock and release the lock. Therefore, the operation efficiency order is: StringBuilder > StringBuffer > String.
- Thread safety
Strings are immutable, so both String and StringBuffer are thread-safe, while StringBuilder is non-thread-safe.
type | Operational efficiency | Thread safety |
---|---|---|
String | low | Security (final) |
StringBuffer | In the | Synchronized |
StringBuilder | high | The security |
Why is String designed to be immutable?
-
How do I make String immutable?
The principle of minimization of variability in Effective Java explains the rules for immutable classes:
- 1. Do not provide any method to modify the object state externally;
- 2. Ensure that classes are not extended (declared as final classes or private constructors);
- Declare all fields final.
- 4, declare all fields private;
- Ensure mutually exclusive access to any mutable components.
-
Why design immutable classes?
- 1. Compared with the uncertainty of mutable classes, immutable classes are stable and reliable, suitable for hash table keys;
- Immutable objects are thread-safe in nature and do not require synchronization;
- 3. Risk: Creating objects with immutable classes can be expensive. Using mutable companion classes to improve performance, StringBuilder and StringBuffer can be understood as configuration classes for String.
Tip: reflection can break the immutability of strings.
Implementation principle of String +
The String+ operator is the compiler syntax sugar, and after compilation the + operator is replaced with StringBuilder#append(…). , such as:
Source: String String = null; for (String str : strings) { string += str; } return string; String String = null; for(String str : strings) { StringBuilder builder = new StringBuilder(); builder.append(string); builder.append(str); string = builder.toString(); } bytecode: 0 aconst_null 1 astore_1 2 aload_0 3 astore_2 4 aload_2 5 arraylength 6 istore_3 7 iconst_0 8 istore 4 10 iload 4 12 iload_3 13 if_icmpge 48 (+35) 16 aload_2 17 iload 4 19 aaload 20 astore 5 22 new #7 <java/lang/StringBuilder> 25 dup 26 invokespecial #8 <java/lang/StringBuilder.<init>> 29 aload_1 30 invokevirtual #9 <java/lang/StringBuilder.append> 33 aload 5 35 invokevirtual #9 <java/lang/StringBuilder.append> 38 invokevirtual #10 <java/lang/StringBuilder.toString> 41 astore_1 42 iinc 4 by 1 45 goto 10 (-35) 48 aload_1 49 areturnCopy the code
As you can see, using the string + directly in the loop produces a lot of intermediate variables, which is very poor performance. You should create a New StringBuilder outside the loop and manipulate the object uniformly within the loop.
Memory allocation for String
-
“ABC” with the new String (” ABC “)
-
“ABC” => The virtual machine first checks for the presence of “ABC” in the runtime constant pool and returns it if it does, otherwise it creates an “ABC” object in the string constant pool and returns it. Therefore, multiple declarations use the same object;
-
New String(” ABC “) => During compilation, Javac adds “ABC” to the Class file constant pool. At Class loading time, the Class file constant pool is loaded into the runtime constant pool. When the new bytecode instruction is called, the virtual machine creates a new object in the heap and references the “ABC” object in the constant pool.
-
-
String#intern() implementation principle
If the String constant pool already contains a String equal to the String object, the String from the constant pool is returned; Otherwise, copy the String contained in the String object to the constant pool, the String in the constant pool.
Starting with JDK 1.7, String# Intern () no longer copies strings into the constant pool, but instead generates a reference to the original String in the constant pool and returns it.
Example: String s = new String("1"); s.intern(); String s2 = "1"; System.out.println(s == s2); String s3 = new String("1") + new String("1"); s3.intern(); String s4 = "11"; System.out.println(s3 == s4); The command output is JDK1.6 or later: false false JDK1.7 or later: false trueCopy the code
Why does String#hashCode() use 31 as a factor?
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
Copy the code
- 31 can be optimized by the compiler
Tip: bitwise and subtraction are more efficient than multiplication.
- 31 is a prime number
A prime number is a number that can only be divisible by 1 and itself. If a prime number is used as the multiplication factor to obtain the hash value, the probability of obtaining the same index will be reduced in the future modulus taking, that is, the probability of hash conflict will be reduced.
- 31 is a moderate prime number
If the prime number is too small, it is easy to cause the hash values to gather in a cell, which provides the probability of hash conflict. If the prime number is too large, it is easy to cause the hash value to exceed the value range of int (overflow), lose part of the numerical information, and the probability of hash conflict is unstable.
Creation is not easy, your “three lian” is chouchou’s biggest motivation, we will see you next time!