preface

Memory optimization, whether Android or Java, inevitably requires some understanding of internal storage, a must. Recently do memory optimization, it is necessary to review the basics.

data

Android Memory Optimization

Android performance optimization built-in optimization

Android memory management mechanism

Java Virtual machine stack – stack frames, operands, and local variables

Do you really know Eascape Analysis?

Java runtime data area – Methods area

directory

The memory area in which Java runs

1. How the JVM loads classes

1.1 Java class loader

The parent delegate mechanism first delegates the loading task to the parent class loader and recurses in turn. The parent class can complete the loading request of the class and return successfully. Only when the parent class loader cannot complete the loading, the subclass loader will try to load itself.

Start the class loader (BootStrap) or the root class loader

Implemented by C++, load virtual machine-recognized libraries into memory, such as the core libraries under JAVA_HOME/lib or jar packages specified by the -xbootclasspath option. Because starting the classloader involves the details of the virtual machine’s local implementation, the developer cannot directly get a reference to the starting classloader.

Extension class loader

Implemented by Sun’s ExtClassLoader (sun.misc.Launcher$ExtClassLoader), it is responsible for loading into memory the JAVA_HOME /lib/ext or the class libraries in the location specified by the system variable -djava.ext.dir. Developers can directly use the standard extension class loader, implemented in the Java language, with a null parent class loader.

System Class loader

The Sun AppClassLoader (sun.misc.Launcher$AppClassLoader) is responsible for the user classpath (java-classpath or -djava.class. path variable to the directory, That is, the path of the current class and the path of the third-party class libraries that it references. A program can obtain the system ClassLoader through the static method getSystemClassLoader() of the ClassLoader. This class loader is usually used as the parent class loader for user-defined class loading without special explanation. Its parent class loader is the extension class loader.

1.2 The process of class loading

Loading the Class Loading process

  • Get the binary stream that defines a class by its fully qualified name

  • Transform the static storage structure represented by this byte stream into the runtime data structure of the method area

  • Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area

Validation (load phase and connect phase are interleaved)

  • Ensure that the byte stream of the Class file contains information that represents the requirements of the current VIRTUAL machine and does not compromise the security of the virtual machine

  • File format validation (byte stream compliance with Class file format specification)

    • Does it start with the magic number 0xCAFEBABE

    • Check whether the major and minor versions are within the processing range of the current VM

    • If there are unsupported constant types in the constant pool, check the tag flag

    • Whether any of the various index values pointing to constants point to nonexistent constants or constants of unsigned types

    • CONSTANT_UTF8_info whether there is unsigned UTF8 encoded data in the constant

    • Whether any other information about parts of the Class file and the file itself has been deleted or added

  • Metadata validation (validation of bytecode description information to ensure that there is no metadata that does not conform to Java language specifications)

    • Whether this class has a parent class

    • Whether the parent of this class inherits classes that are not allowed to be inherited

    • If the class is not abstract, does it implement all the methods required by its parent or interface

    • Whether a field or method in a class conflicts with its parent class

  • Bytecode validation (the main purpose is to determine that program semantics are legitimate and logical through data flow and control flow analysis. This stage is mainly to verify and analyze the method body.

    • Ensure that the operand stack data type and instruction code sequence work together at any time

    • Ensure that jump instructions do not jump to bytecode instructions outside the method body

    • Ensure that type conversions in the method body are valid

    • JDK 1.6 optimization: check the status of “Code” property in the property table “StackMapTable” property, JDK 1.7 type check failure can not be returned to type derivation.

  • Symbolic reference verification

    • Whether a class can be found for a fully qualified name described by a string in a symbol reference

    • Whether a field descriptor that matches a method exists in the specified class and the methods and fields described by the simple name

    • Whether the class, field, or method access type in a symbolic reference is accessible to the current class

Prepare (formally allocate memory for class variables (static modifier) and set the initial value of class variables (int I -> 0), the memory used by these variables will be allocated in the method area)

Parsing (the process by which the virtual machine replaces symbolic references in a constant pool with direct references)

Initialize the

  • The “< Clinit >()” method is generated by combining the assignment action of all class variables in a class that the compiler automatically collects with statements in a static statement block.

  • The initialization phase is the process of executing the class constructor “< Clint >()” method.

2. Java runtime memory area

2.1 method area

Used to store information about classes loaded by the virtual machine, constants, static variables, and code caches compiled by the just-in-time compiler. The constant pool is also in the methods area.

Type information includes the class’s full full path name, type modifiers,

Method information includes method names, method return value types, method parameters and types, method modifiers, method bytecodes, exception tables (the start and end positions of each exception handling, the offset address of the code in the program counter, and so on)

The constant pool contains various literals and symbolic references to type fields and methods, and can be thought of as a table

Of course, the runtime constant pool is also part of the method area. After a class is loaded to the VIRTUAL machine, a corresponding runtime constant pool is created. The JVM maintains a constant pool for each loaded type (class or interface), and the data in the constant pool is accessed by index, just like an array. Note here that the JVM throws an OutOfMemoryError if more memory is required to build a run-time constant pool than the maximum available from the method area

Prior to JDK 1.6, the Hotspot method area had persistent generations where static variables were stored

In Jdk 1.7, there were permanent generations, but they have been phased out, and string constant pools, static variables removed, are stored in the heap

In JDK 1.8 and beyond, there are no permanent generations. Type information, fields, methods, and constants are stored in the local meta-space, but the string constant pool and static variables are still stored in the heap space.

As for permanent proxies what will be replaced by meta-space?

First of all, you need to understand that the meta space is an area of local memory that is not connected to the heap. The maximum amount of memory that can be allocated is the memory available to the system. In this case, the OOM of the method area does not exist, and it is also difficult to tune the method area, which can reduce the probability of fatal errors.

If the StringTable in the constant pool is set to the heap at run time, it will not be recycled until Full GC is triggered. Full GC is usually triggered in the old days, so the recovery of stringtables is not high. Low collection efficiency results in insufficient method area memory.

2.2 the heap

Almost all object instances (which escape analysis does not) are allocated and array on the heap

Size instructions are often assigned

– Maximum Xmx heap size

-Xms Specifies the minimum heap size

-Cenozoic size of Xmn

-xx :NewSize Specifies the minimum value of the new generation

-xx :MaxNewSize Specifies the maximum value of the new generation

Escape analysis

The official explanation is that

Escape is when an object created within a method is referenced by other variables outside the method body as well as within the method body. The consequence of this is that objects created in this method cannot be collected by GC after the method is executed because they are referenced by other variables. In normal method calls, objects created in the method body are recycled after execution. Therefore, because it cannot be recovered, it becomes escape.

Looking at the code below, the allocated memory is stored on the stack

Public class Test {public static void main(String[] args) {for(int I = 0; i< 10000000; i++){ allocate(); }} static void allocate(){Demo = new Demo(2021,2021.0f); } static class Demo{ int a; float b; public Demo(int a, float b) { this.a = a; this.b = b; }}}Copy the code

2.3 Java Virtual Machine Stack

Used to implement method calls, each method call corresponds to a stack frame in the stack.

Only the method area and heap are shared by the thread. The rest is private to the thread. Method calls correspond to the loading and unloading of the stack frame.

The storage space of a stack frame is allocated within the Java virtual machine stack by the thread that created it, and each stack frame has its own local variable table (local variable table), operand stack, and a reference to the runtime constant pool of the class to which the current method belongs. Each stack frame contains a symbolic reference to the method that the stack belongs to in the runtime constant pool. This reference is held to support Dynamic Linking during method invocation.

Local variable scale

  • The VM searches for local variables by indexing them. The index ranges from 0 to the maximum capacity of the table. A local variable can hold a reference variable of type Boolean, byte, CHAR, short, int, float, and an object. The variable is automatically released except in scope.

The operand stack

  • It is a lifO stack (LIFO). As with local variable tables, the maximum depth of the operand stack is written into the method at compile timeCodeProperties of themax_stacksIn the data item,Depth of operand stackNo more thanmax_stacksThe maximum value set in.

Dynamic link

  • In aclassFile, a method that calls other methods needs to convert symbolic references to those methods into direct references to their memory addresses, andSymbolic references exist in the run-time constant pool in the method area.
  • Some of these symbolic references are converted directly to direct references during class loading or the first time they are used, which is called static resolution. The other part is converted to a direct reference during each run, which is called a dynamic join.

Method return value

  • When a method starts executing, there may be two ways to exit, either to complete the exit normally or to complete the exit abnormally
  • A normally completed exit can be understood as passing the return value to the method caller.
  • Exception completion exit refers to an exception encountered during the execution of a method that is not handled within the method body, causing the method to exit.
  • Whether it isJavaExceptions thrown by virtual machines are still used in codeathrowAn exception generated by an instruction will cause the method to exit if no corresponding exception handler is found in the method’s exception table.

Look at the Java virtual Machine-class file structure for loading and unloading stack frames

2.4 Local method stack

The local method stack is similar to the virtual machine stack except that it holds local method calls, and in the HotSpot VIRTUAL Machine implementation, the virtual machine and the local method stack are merged together.

When a JVM-created thread calls native methods, the JVM no longer creates stack frames in the virtual machine stack for it; the JVM simply connects dynamically and calls native methods directly.

2.5 Program counter

It is equivalent to the function of a PC register, occupying very small memory, which can be understood as the line number indicator of the bytecode executed by the current thread. Each thread is stored independently and does not affect each other. It can ensure that the thread is interrupted after the resumption of execution according to the interruption of the instruction to continue to execute.

It is used to record the addresses of bytecodes executed by individual threads, for example, branches, loops, jumps, exceptions, thread recoveries, etc., all depend on counters.

Because Java is a multithreaded language, when the number of threads executing exceeds the number of CPU cores, threads compete for CPU resources based on time slice polling. If a thread runs out of time or is robbed of CPU resources prematurely for other reasons, the exiting thread needs a separate program counter to record a running instruction.

The program counter is also the only memory area in the JVM that is not OOM(OutOfMemory)

3. Direct memory

This memory is not part of the data area when the VIRTUAL machine is running. How many GIGABytes of memory are left? 10 gigabytes, ok, so it can be up to 10 gigabytes, but if the direct memory consumption is too large, the JVM’s maximum heap allocation, etc., can be affected, because the capacity is fixed.

Direct memory is not controlled by the JVM, but if you exceed the specified memory, you will receive OOM.

Heap memory

1. Memory overflow

There are four common types of memory overflow: stack overflow, heap overflow, method area overflow and native direct memory overflow

For such overflow problems, it is necessary to start checking codes and parameters. Generally, in order to protect the program, it is necessary to establish a set of memory monitoring system. When the alarm exceeds the specified threshold, it is necessary to analyze and optimize the problem according to the log

2. Virtual machine optimization technology

2.1 compilation optimization technology

  • Methods the inline

Copy the code inside the method into the code we call, such as the following

Public static void main(String[] args) {int c = allocate(1,2); // int c = 1 + 2; } static int allocate(int a,int b){ return a+b; }Copy the code

Since each method call corresponds to the loading and unloading of the stack frame, we can directly copy the code identified at compile time into the main method, such as int c= 1 + 2 in the comment

2.2 stack optimization technology

  • Data sharing between stack frames

Method A calls method B, whose arguments are in the local variable table, and whose arguments are in the operand stack. The two pieces of A and B share A region to pass parameters

During the execution of the main method, the get method is called, passing a 10.

public class Test { public static void main(String[] args) { Test test = new Test(); test.get(10); } public int get(int x){ int z = x+1; return z; }}Copy the code

3. Object creation process

Let’s start with the next common problem, the Person object instantiation process

Person person = new Person();
Copy the code

  • Class loading

  • The JVM encounters an instruction for bytecode new that checks to see if the class is loaded and can locate symbolic references in the constant pool and needs to be loaded into memory. This is the validation phase of Class loading, making sure the Class files are correct, such as magic numbers, primary and secondary versions, symbolic references, etc.

  • Then there is the preparation phase, which allocates memory. There are two internal ways: pointer collision (continuously trying to move object size distance through pointer, mostly in memory contiguous scenario) and free list (when memory is not contiguous, maintain a free list, clear down to available memory area). For the concurrent security of memory allocation, there are two internal modes: CAS retry and local thread buffer allocation. Default values are assigned to class variables, constants, and static final values.

  • Then comes the parsing phase, the process of replacing symbolic references in the constant pool with direct references

  • Finally, there is the initialization phase, the initialization of the object

4. Layout space of object memory

Object memory structure, composed of three parts, respectively is the object header, object instance, alignment fill.

The object header consists of two parts. The first part is markWord, which is used to store the object’s own runtime data, such as HashCode, GC generation age, lock status flag, thread held lock, bias thread ID, bias timestamp, etc. The length of this data is 32 – and 64-bit, respectively, on a 32-bit or 64-bit VM (with compression pointer disabled). The second part is the class type pointer, the metadata pointer to which the object points, which the virtual machine uses to determine which instance the object belongs to. ** In the case of an array object, there must also be a block in the header that records the length of the array. The ** object instance part is the valid information that the object actually stores. The contents of various types of fields defined in the program code, whether inherited from the parent class or defined in a subclass, need to be recorded. Alignment padding is not required, but serves only as a placeholder, and is needed to complete when the object instance data part is not aligned.

5. Object access positioning

For example, how are objects located during the previous creation of the Person object? Why is this on the stack and that on the heap accessible

If the object moves and the object instance data address changes, the handle pool automatically locates the new object instance address. Because objects are created frequently, the overhead of automatic pointer positioning is high, so direct Pointers are usually used.

6. Determine the survival of the object

Reference counting method

A reference to it is incremented, an invalid is subtracted by 1, =0 means it cannot be referenced again, and if two objects reference each other, it cannot be reclaimed

Accessibility analysis

By determining that an object is not connected to GC ROOT by a chain of references, it is unreachable

There are static variables, Thread stack variables, constant pools, JNI Pointers, and so on that can be used as GC ROOT. Typical Handler memory leaks are caused by threads holding Activity references

7. Four kinds of quotations

  • Strong references have a long life cycle and are never recycled
  • ** Soft references ** are reclaimed only when their life cycle is short and they run out of memory. When the object is recycled, the soft reference will be put into ourReferenceQueueIn the. For example, for image loading, some images do not need to be displayed, but only need to be displayed on the screen, you can use soft references. Common image cache, web cache.
  • A weak referenceThe lifetime is shorter than that of soft references, and once a weak reference object is found, it is reclaimed regardless of whether the current memory space is sufficient. When memory is reclaimed, weak references are put into ourReferenceQueueIn the. Some cached, less important data can also be weakly referenced.
  • Phantom referenceThe life cycle is minimal, and if an object holds only virtual references, it has no references and can be collected by the garbage collector at any time. Cooperate withReferenceQueueTo receive a system notification when the object is reclaimed by the collector.

8. Object allocation strategy

  • The Cenozoic generation accounts for 1/3 of the space, and the old generation for 2/3

  • Objects will be allocated in Eden area first. The ratio between Eden area and From area and To area is 8:1:1. Why is this ratio? ** Because of this ratio, memory space utilization is up to 90%, plus the objects are instantaneous and created frequently. So the replication algorithm is more suitable for the new generation. One thing to be aware of here is that if the new generation does not have enough memory to allocate a large object, it will use the space allocation guarantee to go straight to the old generation.

  • After each Minor GARBAGE collector, i.e. the new generation GARBAGE collector, the surviving objects will enter the From region, and the AGE of THE From region will be +1. Then Eden will expand the object, and the GC will be triggered again. The Eden region living will move To the To region, and the From region living will also move To the To region, and the age will be +1. Surviving objects in the To region will be moved To the From region until they reach the specified GC age and then enter the old age. And if any of the Eden, From, or To regions is full, GC is triggered.

  • After 15 cycles, surviving objects move from new generation to old age.

  • Full GC will recycle new generation, old age and method areas

Garbage collector

9.1. Garbage collection algorithm

  • Replication algorithm

    • Divide the available memory into two equal chunks and use only one of them at a time. If you don’t have enough, you reduce memory by half. It’s too wasteful without a lot of memory reclamation. (For the new generation, there is less memory to recycle).
  • Mark-clear algorithm

    • 100% utilization, no memory replication is required, but there may be a large number of discrete memory fragments.
  • Mark-collation algorithm

    • Let live objects move to one end. For the old age, the utilization rate is 100 percent, no memory fragmentation, the need for memory replication, efficiency is general.

9.2. Common garbage collectors

Single-threaded is used in conjunction with multi-threaded, concurrent collectors

As for switching between single-thread and multi-thread, the memory footprint is relatively small at the beginning, and as memory becomes larger and larger, it will switch from single-thread to multi-thread

The Parallel Insane, Parallel Old, and Serial are the same algorithms. The insane is a Parallel garbage collector and is recycled efficiently.

When cleaning the house, generally will call you don’t make garbage, and then start to clean up the sanitation, otherwise when the sanitation can clean up. The garbage collector does the same thing.

Whether single-threaded or multi-threaded, all user threads are paused during GC.

But sometimes when you have guests over, you can’t tell them to stop, so you have to quietly tell them to. In the same vein, the concurrent garbage collector is introduced, where the garbage collector thread and the user thread can work simultaneously. For example, CMS, but CMS is only suitable for the old generation, G1 is suitable for the new generation, the old generation.

9.3. CMS garbage collector

CMS(Concurrent Mark Seep) uses a mark-sweep algorithm, currently only for older generations

  • Initial flag, temporary all user threads, short time

  • Concurrent token, simultaneous

  • Re-marking, suspends all user threads for a short time

  • Concurrent cleanup. If garbage is generated by this node, it is not processed and is floating garbage waiting for the next GC process.

9.4. G1 garbage collector

The biggest difference is that after the final mark, the recycling needs to be screened again, mainly in order to pursue the pause time and reduce the lag.

The new generation adopts copy algorithm and the old generation adopts mark-collation algorithm.

The memory layout is also different, reducing the process of copying objects, for example, within each box below, maybe 80%, 60%, etc. Allocate large objects that exceed the memory size of a box by allocating contiguous H space.

3. Android memory allocation

In Android, there is no swap area for memory. Instead, It uses paging and memory-mapping(Mmapping) to manage memory. The heap is an anonymous shared memory and is managed at the C layer.

The role of Zygote processes

  • Starting a VM

  • Register JNI functions

  • JNI calls to the Java layer

  • Preload resources, common classes, and theme-related resources

  • Fork SystemServer (single-threaded fork)

  • Enter the Socket Loop to wait for messages

Zygote and its children share preloaded resources. How is this shared?

  • For the most part,AndroidBy explicitly allocating shared memory regions (e.gashmemorgralloc) to implement dynamicRAMA mechanism by which regions can be shared between different processes. For instance,Window SurfaceinAppwithScreen CompositorWith shared memory,Cursor BuffersinContent ProviderwithClientsShared memory between.

Difference between Dalvik and ART

  • A register is a CPU component. A register is a high-speed storage unit with limited storage capacity, which can be used to temporarily store instructions, data and addresses. For all operations in the JVM, data is moved around the local variable table and operand stack.

  • In Dalvik VM, each thread has its own PC and call stack, and the activity records of method calls are stored in the call stack in frames. Compared to the JVM, Dalvik’s program has fewer instructions and fewer data moves. Because it combines the operand stack and the local variable table into a virtual register.

  • Dalvik executes dex bytecode, which is interpreted as machine code execution. Starting from Android 2.2, JIT compilation is supported, which selects hot code for compilation or optimization during program execution

    • ART is a developer option introduced in Android 4.4 and is the default for Android 5.0 and later. The ART virtual machine executes local machine code. The replacement of the Android runtime from the Dalvik VIRTUAL machine to the ART virtual machine does not require developers to compile their applications directly into machine code, and APK is still a file containing DEX bytecode.

    • Where does the local machine code for ART virtual machine execution come from? During installation in Dalvik, optimization will be performed to optimize dex bytecode into Odex file, while translation of dex file code will be performed in ART. The timing is also when the application is installed. ART introduces the pre-compilation mechanism and uses the dex2OAT tool of the device to compile the application. Bytecodes in dex are compiled to local machine code.

    • Android N starts mixed compilation. AOT interpretation and JIT

      • Installation is accelerated without any AOT compilation. Execution is interpreted during execution, frequently executed methods are JIT, and jIT-compiled methods are recorded in Profile profiles.

      • When the device is idle or charging, the compilation daemon runs, AOT compiling common code from a Profile file. Use it directly for the next run.

Dalvik and ART memory allocation model

  • Quote from someone else’s blog and give a brief description

Android applications switch memory management

  • Android does not swap memory when users switch applications. Android places application processes that do not contain Foreground components in the LRU Cache. For example, when a user starts an application, the system creates a process for it. However, when the user leaves the application, the process is not immediately destroyed, but placed in the system Cache. If the user later switches back to the application, the process can be immediately restored in its entirety, enabling fast switching of applications.

  • If you have a cached process in your application, the process occupies a certain amount of memory, which can affect the overall performance of the system. Therefore, when the system starts to enter the Low Memory state, it will be determined by the system according to the LRU rules and application priority, Memory usage, and other factors.

Android garbage collector

In Dalvik era, it is mainly mark-sweep algorithm and similar algorithm in CMS. On the basis of CMS, there is a generational CMS, with GCMap added to add JIT. In this era, the Android system does not defragment the free memory area of the Heap. Instead, the system simply checks whether the free space at the end of the Heap is sufficient before a new memory allocation, and if it is insufficient, the GC operation will be triggered to free up more free space.

After Dalvik took the lunch, ART implemented mark-collation algorithm, copy algorithm and CMS algorithm. CMS is the default algorithm and introduced some technologies to solve the fragmentation problem. Android N introduced Concurrent Copying (CC) algorithms, and CC did a generational shift that also included incremental garbage collectors.