preface

The intern method is not recommended in the Java layer, but it can be used to implement a similar pool.

About INTERN Method

This method returns a string standard object. The JVM has a special string constant pool to maintain these standard objects. The constant pool is a hash map structure. If not, the standard object is created into the pool and returned.

The search process uses the string value as the key. That is, the same standard object is obtained for the same string value. For example, in the Java layer, multiple string objects with the string value key1 can be obtained, but the same object is obtained through the intern method.

What does it do

So what does the INTERN method do? In the Java layer, as long as the string values are equal, the object obtained by the intern must be the same object. So, for example,

String st = new String("hello world");
String st2 = new String("hello world");
System.out.println(st.intern() == st2.intern());
Copy the code

Do you see that? Can we use = = to compare two objects of value, want to know is in Java that can only decide whether or not they are for the same reference, but through the intern method can directly so contrast, after processing than equals but a lot faster, opening up performance. You might say, well, that’s because intern already did an equals comparison, so it’s going to take a lot of time. Yes, you’re right, but if I’m going to compare multiple times later, isn’t that going to be an advantage? Just do one equals comparison and all of them can be quickly compared using ==.

In addition, some scenarios also can save memory effect, such as to maintain a lot and may repeat string objects, such as thousands of string objects, and about ninety thousand in the string value of the same, then by the method of intern string objects can be reduced to ten thousand, the same value object of all share the same standard.

Joins the run-time constant pool

There are two ways to add string objects to the runtime constant pool at the Java layer:

  • Use double quotes directly to declare a string object in a program that will be added to the constant pool at execution. For example, when compiled into bytecode, the class is added to the constant pool by the corresponding instruction at run time.
public class Test{
    public static void main(String[] args){
        String s = "hello"; }}Copy the code
  • The other is through the String classinternMethod that checks if the current string already exists in the constant pool and adds it to the constant pool if it does not. So, for example,
String s = new String("hello");
s.intern();
Copy the code

Let’s do another example

JDK9.

public class Test {
	public static void main(String[] args) {
		String s = new String("hello");
		String ss = new String("hello");
		System.out.println(ss == s);
		String sss = s.intern();
		System.out.println(sss == s);
		String ssss = ss.intern();
		System.out.println(ssss == sss);

		System.out.println("= = = = = = = = =");

		String s2 = "hello2";
		String ss2 = new String("hello2"); System.out.println(ss2 == s2); String sss2 = s2.intern(); System.out.println(sss2 == s2); String ssss2 = ss2.intern(); System.out.println(ssss2 == sss2); }}Copy the code
false
false
true= = = = = = = = =false
true
true
Copy the code

Constant pool implementation

The Java layer is simple, just defining intern as a local method.

public native String intern();
Copy the code

JVM_InternString: JNIHandles:: resolve_non_NULL: JNIHandles:: resolve_non_NULL: JNIHandles:: resolve_NON_NULL: JNIHandles:: resolve_NON_NULL: JNIHandles:: resolve_NON_NULL Finally, JNIHandles::make_local converts to Java layer objects and returns.

JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
    return JVM_InternString(env, this);
}

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
  JVMWrapper("JVM_InternString");
  JvmtiVMObjectAllocEventCollector oam;
  if (str == NULL) return NULL;
  oop string = JNIHandles::resolve_non_null(str);
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);
JVM_END
Copy the code

A StringTable::intern is a pool of constants that the JVM uses to store constants at runtime. Its structure is a hash Map, roughly as shown below,

The main logic is to calculate the unicode encoding length of the utF-8 encoded string, create a new array according to the required unicode encoding length and convert the string to Unicode encoding, and finally call another intern function.

oop StringTable::intern(const char* utf8_string, TRAPS) {
  if (utf8_string == NULL) return NULL;
  ResourceMark rm(THREAD);
  int length = UTF8::unicode_length(utf8_string);
  jchar* chars = NEW_RESOURCE_ARRAY(jchar, length);
  UTF8::convert_to_unicode(utf8_string, chars, length);
  Handle string;
  oop result = intern(string, chars, length, CHECK_NULL);
  return result;
}
Copy the code

The logic goes like this,

  1. throughjava_lang_String::hash_codeI get the hash.
  2. Call based on a hash valuelookup_sharedThe function looks to see if a string object with this value already exists in the shared hash table, and if so, returns the found object directly, which is called indirectlylookupDelta function, and we’ll look at that later.
  3. Whether another hash algorithm is used, and if so, recalculate the hash.
  4. throughhash_to_indexThe hash function evaluates the index of the hash value.
  5. throughlookup_in_main_tableThe function looks for a string object in the bucket and returns it if it finds it.
  6. If none of the above is found in the hash table, it needs to be added to the tableMutexLockerLock, then callbasic_addThe function completes the add operation, which will be examined later.
  7. Returns a string object.
oop StringTable::intern(Handle string_or_null, jchar* name,
                        int len, TRAPS) {
  unsigned int hashValue = java_lang_String::hash_code(name, len);
  oop found_string = lookup_shared(name, len, hashValue);
  if(found_string ! = NULL) {return found_string;
  }
  if (use_alternate_hashcode()) {
    hashValue = alt_hash_string(name, len);
  }
  int index = the_table()->hash_to_index(hashValue);
  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);

  if(found_string ! = NULL) {if(found_string ! = string_or_null()) { ensure_string_alive(found_string); }return found_string;
  }
  Handle string;
  if(! string_or_null.is_null()) { string = string_or_null; }else {
    string = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
  }
  oop added_or_found;
  {
    MutexLocker ml(StringTable_lock, THREAD);
    added_or_found = the_table()->basic_add(index, string, name, len,
                                  hashValue, CHECK_NULL);
  }

  if(added_or_found ! = string()) { ensure_string_alive(added_or_found); }return added_or_found;
}
Copy the code

The constant pool is a hash table, so what is the default number of buckets? Looking at the definition below, the default is 60013 on 64-bit systems and 1009 on 32-bit systems.

const int defaultStringTableSize = NOT_LP64(1009) LP64_ONLY(60013);
Copy the code

The logic for finding a hash table is,

  1. The hash value mod the number of buckets to get the index.
  2. Obtain bucket information by index.
  3. Gets the offset of the bucket.
  4. Gets the type of the bucket.
  5. For entry.
  6. If it isVALUE_ONLY_BUCKET_TYPEBucket of type, decodes the object corresponding to the offset directly. Each entry of this type of entries has only one 4-byte to represent the offset, i.eu4 offset;.
  7. For a normal bucket type, the entry is iterated to find the offset corresponding to the entry with the specified hash value, and the object corresponding to the offset is decoded. Where, each entry in the entries has 8 bytes and the structure isu4 hash; union {u4 offset; narrowOop str; }, preceded by a hash value followed by an offset or character object pointer.
  8. The structure of the two different types can be simply illustrated by the fact that the first bucket and the third bucket are ordinary types pointing to [hash + offset] consisting of many entries, while the second bucket isVALUE_ONLY_BUCKET_TYPEType that points directly to offset.
buckets[0, 4, 5, ....]  | | | | | +---+ | | | | +----+ | v v v entries[H,O,H,O,O,H,O,H,O.....]Copy the code
template <class T, class N>
inline T CompactHashtable<T,N>::lookup(const N* name, unsigned int hash, int len) {
  if (_entry_count > 0) {
    int index = hash % _bucket_count;
    u4 bucket_info = _buckets[index];
    u4 bucket_offset = BUCKET_OFFSET(bucket_info);
    int bucket_type = BUCKET_TYPE(bucket_info);
    u4* entry = _entries + bucket_offset;

    if (bucket_type == VALUE_ONLY_BUCKET_TYPE) {
      T res = decode_entry(this, entry[0], name, len);
      if(res ! = NULL) {returnres; }}else {
      u4* entry_max = _entries + BUCKET_OFFSET(_buckets[index + 1]);
      while (entry < entry_max) {
        unsigned int h = (unsigned int)(entry[0]);
        if (h == hash) {
          T res = decode_entry(this, entry[1], name, len);
          if(res ! = NULL) {returnres; } } entry += 2; }}}return NULL;
}
Copy the code

The logic for adding a hash table is as follows,

  1. Check whether another hash algorithm is used. If yes, the hash value is recalculated and the corresponding index value is calculated.
  2. throughlookup_in_main_tableThe hash () function checks to see if string values already exist in the hash table.
  3. Create entry, including hash values and Pointers to string objects.
  4. throughadd_entryFunction added to the hash table.
  5. Returns a string object.
oop StringTable::basic_add(int index_arg, Handle string, jchar* name,
                           int len, unsigned int hashValue_arg, TRAPS) {

  NoSafepointVerifier nsv;
  unsigned int hashValue;
  int index;
  if (use_alternate_hashcode()) {
    hashValue = alt_hash_string(name, len);
    index = hash_to_index(hashValue);
  } else {
    hashValue = hashValue_arg;
    index = index_arg;
  }
  oop test = lookup_in_main_table(index, name, len, hashValue); 
  if (test! = NULL) {return test;
  }
  HashtableEntry<oop, mtSymbol>* entry = new_entry(hashValue, string());
  add_entry(index, entry);
  return string();
}
Copy the code

-XX:StringTableSize

The JVM’s default bucket size is 60013 for 64-bit systems and 1009 for 32-bit ones. If we want to change its size, we can do this by setting -xx :StringTableSize.

-XX:+PrintStringTableStatistics

If you want to see the constant pool related statistics, you can set – XX: + PrintStringTableStatistics, then the JVM stops will output the related information. For instance,

SymbolTable statistics: Number of Buckets: 20011 = 160088 bytes, AVG 8.000 Number of entries: 20067 = 481608 bytes, AVG 24.000 Number of literals: 20067 = 838520 bytes, AVG 41.786 Total footprint: = 1480216 bytes Average bucket size: 1.003 Variance of bucket size: 0.994 Std. Dev. Of bucket size: 0.997 Maximum bucket size: 8 60013 = 480104 bytes, AVG 8.000 Number of entries: 1003077 = 24073848 bytes, AVG 24.000 Number of literals: 1003077 = 48272808 bytes, AVG 48.125 Total footprint: = 72826760 bytes Average bucket size: 16.714 Variance of bucket size: 9.683 std.dev. Of bucket size: 3.112 Maximum bucket size: 30Copy the code

————- Recommended reading ————

My 2017 article summary – Machine learning

My 2017 article summary – Java and Middleware

My 2017 article summary – Deep learning

My 2017 article summary — JDK source code article

My 2017 article summary – Natural Language Processing

My 2017 Article Round-up — Java Concurrent Article

—————— advertising time —————-

Talk to me, ask me questions:

The public menu has been divided into “distributed”, “machine learning”, “deep learning”, “NLP”, “Java depth”, “Java concurrent core”, “JDK source”, “Tomcat kernel” and so on, there may be a suitable for your appetite.

Why to write “Analysis of Tomcat Kernel Design”

Welcome to: