The other day I had a long conversation with my little sister

preface

Just yesterday, small size son and colleague egg elder brother interviewed a little sister to apply for a job. Review the whole interview process, little sister’s performance can be said to be commendable. So small size son busy take a break to little sister’s interview process sorted out, to share with you.

The whole process was like this. On a sunny afternoon yesterday, Brother Dan and I were handling the daily work of the company as before, and were informed by hr to have an interview for Java candidates. When egg brother and I walked into the interview room with our little notebooks for dinner. Ah, a pretty little sister.

My old face a red, no, no, with girls, I want to be reserved. Egg brother and I politely smiled at the girl and said “SORRY to keep you waiting”, then I motioned the girl to sit down and said: “Let’s start, I see your resume has done JVM tuning, then we will discuss JVM today…” .

The body of the

What is the JVM?

The JVM is simply an imaginary computer that simulates various computer functions on an actual computer.

The JVM shields platform-specific information, allowing Java programs to run unmodified on multiple platforms by generating object code, called bytecode, that runs on the Java VIRTUAL machine. When the JVM executes bytecodes, it actually ends up interpreting bytecodes as platform-specific machine instructions.

The runtime area of the JVM

The Java virtual machine consists of a set of bytecode instructions, a set of registers, a stack, a garbage collection heap, and a storage method field.

What are their functions?

PC register

The PC register is used to store the JVM instructions that each thread will execute next. If this method is native, no information is stored in the PC register.

The JVM stack

The JVM stack is thread private. Each thread is created with the JVM stack, which holds variables of the local primitive types in the current thread. Boolean, char, byte, short, int, long, float, double), partial return results, and Stack frames. Objects of non-basic types only hold one address on the JVM Stack that points to the heap.

Heap (Heap)

It is the area where the JVM stores object instances as well as array values, and it can be considered that memory for all objects created by New in Java is allocated there, while memory for objects in Heap is waiting for GC to collect.

Method Area

(1) in the Sun JDK, this area corresponds to PermanetGeneration, also known as persistent generation.

(2) The method area stores the information of the loaded Class (name, modifiers, etc.), static variables in the Class, constant defined as final type in the Class, Field information in the Class, and method information in the Class. When the developer obtains the information through getName, isInterface and other methods in the Class object in the program, This data comes from the method region, which is also globally shared and is GC under certain conditions. An OutOfMemory error message is raised when the method region needs more memory than it allows.

Runtime Constant Pool

Fixed constants, references to methods and fields are stored in the class, and space is allocated from the method area.

Native Method Stacks

The JVM supports execution of native methods with a native method stack, an area used to store the state of each native method call.

This section describes the vm class loading mechanism

The virtual machine loads the data describing the Class from the Class file into memory, verifies, parses, and initializes the data, and eventually forms Java types that the virtual machine can use directly.

Object creation process

Class loading check

When a virtual machine receives a new instruction, it first checks to see if the instruction’s argument can locate a symbolic reference to a class in the constant pool, and if the class represented by the symbolic reference has been loaded, parsed, and initialized. If not, the corresponding class loading process must be performed first.

Allocate memory

After the class load check passes, the virtual machine will then allocate memory for the new objects. The size of memory required by an object is fully determined after the class is loaded, and allocating space for an object is equivalent to dividing a certain size of memory from the Java heap.

The initial zero

After the memory allocation is complete, the VIRTUAL machine needs to initialize the allocated memory space to zero values (excluding the object header). This step ensures that the instance fields of the object can be used in Java code without assigning initial values, and the program can access the zero values corresponding to the data types of these fields. If TLAB is used, this work process can also be advanced to TLAB assignments.

Set the object header

Next, the virtual machine performs the necessary Settings on the object, such as which class the object is an instance of, how to find the metadata information of the class, whether the object is hashed, and the GC generation age of the object, which are stored in the object’s object header. The object header can be set in different ways according to the vm running status.

Execute the init method

After all the work is done, from the virtual machine’s point of view, a new object has been created, but from the Java program’s point of view, object creation has just begun, methods have not been executed, and all fields are still zero. So, in general, a new instruction is followed by a method that initializes the object as the programmer wants it to be, and then a usable object is fully generated.

How does the virtual machine ensure thread-safety when creating objects?

One of the most important issues in object creation is thread-safety, because in real development, object creation is very frequent. For example, object A is being allocated memory, but the pointer has not been changed, and object B may use the original pointer to allocate memory. VMS must ensure thread safety. Generally speaking, VMS use two methods to ensure thread safety:

CAS+ Retry on failure: CAS is an implementation of optimistic locking. Optimistic locking is when an operation is performed each time without locking, assuming no conflicts, and then retry until it succeeds if it fails because of conflicts. The CENTRAL Authentication Service (CAS) configuration fails to retry to ensure atomicity of update operations.
TLAB: allocate a block of memory in Eden area for each thread. When the JVM allocates memory to objects in a thread, it first allocates memory in the TLAB of each thread. When the object is larger than the remaining memory in the TLAB or the MEMORY in the TLAB is exhausted, the ABOVE CAS is used for memory allocation. The -xx :+/ -usetlab parameter can be used to determine whether to enable TLAB on a VM.

You mentioned recycling earlier. What is garbage?

In short, objects that are no longer alive are garbage.

What algorithms can determine that an object is no longer alive?

There are two algorithms to determine whether an object is alive or not:

Reference counting algorithm: add a reference counter to an object. Whenever an object is applied, the counter is incremented by 1. When a reference is invalid, the counter decays by 1; A counter of 0 indicates that the object is dead and recyclable. However, it is difficult to solve the situation where two objects are referred to each other in a loop.
Reachability analysis algorithm: a series of objects called “GC Roots” are used as the starting point to search downward from these nodes. The search path is called reference chain. When there is no reference chain between an object and GC Roots (that is, the object is unreachable to GC Roots), it is proved that the object is dead and recyclable. Objects that can be used as GC Roots in Java include: objects referenced in the virtual machine stack, objects referenced by Native methods in the local method stack, objects referenced by static attributes in the method area, and objects referenced by constants in the method area.

Because reference counting algorithm is difficult to solve the situation of cyclic reference between two objects, so in our daily development process, usually through the reachability analysis algorithm to determine whether the object is alive.

Give examples of situations in which objects become garbage

An object can be garbage in the following ways:

For a non-threaded object, all active threads cannot access it, and the object becomes garbage.
For a thread object, the above conditions are met and the thread is not started or stopped.

Such as:

(1) changes the reference to an object, such as set tonullOr point to some other object.

   Object x=new Object();//object1 

   Object y=new Object();//object2 

   x=y;// Object1 becomes garbage

   x=y=null;// Object2 becomes garbage



(2) out of scope

   if(i==0) {

      Object x=new Object();//object1 

   }// After the parentheses end, object1 cannot be referenced and becomes garbage

(3Class nesting causes incomplete release

   class A{ 

      A a; 

   } 

   A x= new A();// Allocate a space

   x.a= new A();// A space is allocated

   x=null;// Two garbage will be generated

(4Garbage in threads

   class A implements Runnable{   

     void run(a){ 

       //.... 

     } 

   } 

   //main 

   A x=new A();//object1 

   x.start(); 

   x=null;// Object1 is considered garbage only after the thread completes execution

Copy the code

What are some familiar JVM garbage collection algorithms?

Mark-clear algorithm

The most basic algorithm is divided into two stages: mark the objects that need to be recovered, and uniformly recover all marked objects after the completion of marking.

It has two drawbacks: an efficiency problem, in that both the tagging and cleaning processes are inefficient; One is the space problem. After the mark is cleared, there will be a large amount of discontinuous memory fragmentation (similar to the disk fragmentation of our computer). The space fragmentation is so large that when large objects need to be allocated, they cannot find enough contiguous memory and have to trigger another garbage collection action in advance.

Replication algorithm

To solve the efficiency problem, the “copy” algorithm was developed, which divided the available memory into two equal chunks by capacity, and used only one of them at a time. When a block of memory is used up, the surviving objects are copied onto the other block, and the newly used memory space is cleaned up again. This solves the memory fragmentation problem, but at the cost of being able to reduce it by half with content.

Mark-collation algorithm

The replication algorithm will carry out frequent replication operations when the object survival rate is high, and the efficiency will be reduced. Hence the mark-clean algorithm, which is the same as the mark-clean algorithm, but instead of cleaning up the object directly, all surviving objects are moved sideways and the memory beyond the end boundary is cleaned up.

Generational collection algorithm

The current GC of commercial virtual machines adopts generational collection algorithm, which has no new idea, but divides the heap according to the different life cycle of objects into: The new generation and the old generation, the method area is called permanent generation (the permanent generation has been deprecated in the new version, introducing the concept of meta-space, which uses JVM memory while meta-space directly uses physical memory).

This allows different collection algorithms to be used according to the characteristics of each generation.

Objects in the new generation “live and die overnight”, a large number of objects will die and a small number of objects will survive each GC, and the replication algorithm is used. The new generation is divided into Eden region and Survivor region (Survivor from and Survivor to), and the default size ratio is 8:1:1.

Objects in the old age use mark-clean or mark-tidy algorithms because they have a high object survival rate and no extra space to allocate guarantees.

When Eden is full, use Survivor from. When Survivor from is also full, perform Minor GC. Copy Survivor from Eden and Survivor from into Survivor to empty Eden and Survivor from, at which point the original Survivor from becomes the new Survivor to. The old Survivor to became the new Survivor from. During replication, if Survivor to cannot accommodate all the surviving objects, the object is copied to the old age according to the old age allocation guarantee. If the old age cannot accommodate either, Full GC (old age GC) is performed.

Large objects directly into old age: a parameter in the JVM configuration – XX: PretenureSizeThreshold, make more than the set value of object directly into old age, the purpose is to avoid in Eden and Survivor area between a lot of memory copy.

Long-lived objects go old: The JVM defines an object age counter for each object, and if the object is still alive after Eden’s birth and passes through the first Minor GC and can be contained by Survivor, it will be moved to Survivor with an age of 1. If he/she has not survived a Minor GC, his/her age is increased by 1, and when he/she reaches a certain age (15 by default, which can be set with XX:MaxTenuringThreshold), he/she moves to the old age. However, the JVM does not always require a maximum age to advance to the old age. If the sum of all object sizes in the Survivor space of the same age (such as age X) is greater than half of Survivor, all objects older than or equal to x enter the old age directly without waiting for the maximum age requirement.

A brief introduction to a familiar garbage collector

The CMS collector is a collector that aims to obtain the shortest collection pause time. The pause time is short and the user experience is good.

Based on “mark clearing” algorithm, concurrent collection, low pause, complex operation process, divided into 4 steps:

1) Initial tag: tag only objects that GC Roots can directly associate with, fast, but need to “Stop The World”

2) Concurrent marking: This is the process of tracing the reference chain so that the garbage collector and the user thread run simultaneously and work concurrently.

3) re-marking: To correct The marking record of The part of The object whose marking is changed due to The continuous running of The user thread in The concurrent marking stage. The marking time is longer than The initial marking time but much shorter than The concurrent marking time. “Stop The World” is required.

4) Concurrent cleanup: Cleanup marked as recyclable objects can be executed concurrently with user threads

Since the most time-consuming concurrent markup and concurrent cleanup can work with the user thread, the CMS collector’s memory reclamation process and the user thread are generally executed concurrently.

But the CMS collector has three disadvantages:

1) Very sensitive to CPU resources

Concurrent collection does not suspend user threads, but it can slow down the application and reduce overall throughput because it consumes CPU resources.

CMS default number of collection threads =(number of cpus +3)/4; When the number of cpus is more than 4, the collection threads occupy more than 25% of the CPU resources, which may have a great impact on user programs. Less than four, the impact is greater and may be unacceptable.

2) In the concurrent cleanup phase, the user thread is still running, and new garbage may be generated during this period. The new garbage cannot be cleaned up in this GC, and can only be cleaned up next time.

Concurrent clearance needs to reserve a certain amount of memory space, can not like other collectors in the old age almost filled up again for collection; A “Concurrent Mode Failure” occurs if the CMS does not have enough memory reserved for the program. At this point, the JVM enables backup: temporarily enable the Serail Old collector, resulting in another Full GC;

3) Large amount of memory fragmentation: CMS is based on the “mark-clear” algorithm, and a large number of discontinuous memory fragments will be generated without compression operation after clearing, which will cause that when allocating large memory objects, enough continuous memory cannot be found, and another Full GC action needs to be triggered in advance.

What’s the problem with concurrent cleanup besides floating garbage?

It also causes “object disappearance”.

For example, let’s take a look at the normal tagging process:

Again, objects 7 and 10 are part of the original reference chain (root node ->5->6->7->8->11->10). The modified reference chain becomes (root node ->5->6->7->10).

Since blue objects are not rescanned, this will cause both object 10 and object 11 to be reclaimed after the scan is over. They are all part of the original reference chain before being modified.

How to solve the problem of “object disappearance”?

The problem of “object disappearance” occurs if and only if the following two conditions are met, and an object that should have been blue is mistakenly marked as white:

Condition 1: The assignor inserts one or more new references from a blue object to a white object.

Condition 2: The assignor removes all direct or indirect references from the blue object to the white object.

In combination with the previous diagram that caused the “object to disappear”, we can see:

The reference between blue object 7 and white object 10 is newly created, corresponding to condition 1.

References between blue objects 8 and white objects 11 are removed, corresponding to condition two.

Because the relationship between the two conditions is if and only if. So, to solve the problem of objects disappearing in concurrent markup, we only need to break either of the two conditions.

This leads to two solutions: incremental updates and raw snapshots.

What is incremental update?

Incremental updates to destruction is the first condition (assignment by inserting one or more new reference object from blue to white objects, when the blue insert a new object pointing to the white object references, with the newly inserted record reference, such as concurrent scanning is done, then these records of references in blue object as the root, to scan again.

Object 9 is then scanned blue again. It’s not recycled, so it doesn’t disappear.

What is a raw snapshot?

Original snapshot is to destroy the second condition, the assignment is removed from blue object to all this white object references, directly or indirectly when blue object to remove points to the white object reference relations, will delete the record reference, at the end of the concurrent scan then these records of references in blue object as the root, to scan again.

This can be simplified as: whether the reference relationship is deleted or not, the search will be based on the snapshot of the object graph at the moment when the scan is started.

The last

Wow! I can’t believe you saw this! All right, that’s it, that’s it, that’s it, that’s it, that’s it Oh no, it’s all the best

Recently busy take a break more a JVM related article, thank you very much small lovely can see here, if you think this article is also good, ask for praise 👍 for attention ❤️ for share 👥 yes, this girl is so vain! Hee hee ~

If there are any mistakes in this article, please comment, thank you very much!

Finally egg brother asks sister, “What makes you so good”??

The little sister smiled and took out her mobile phone, “Because I have been paying attention to [small size son] ah ~”