Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

1. Dalvik

Dalvik is a virtual machine designed by Google for Android platform. It can support the running of Java applications that have been converted into.dex (i.e. Dalvik Executable) format, which is a compression format specially designed for Dalvik and suitable for systems with limited memory and processor speed. Dalvik is optimized to allow multiple instances of virtual machines to run simultaneously in limited memory, and each Dalvik application executes as a separate Linux process. A separate process prevents all programs from being shut down in the event of a virtual machine crash.

Called DVM before Android L (Android 5.0), DVM was removed after 5.0 and replaced with the long-rumored ART (Android Runtime).

In the entire Android operating system, ART is located in the yellow square below:

Not to say to be deleted useless, we do this line after all or a simple understanding.

1.1 Differences between Dalvik and JVM

  • 1. Dalvik is register-based, while JVM is stack based.

  • Register-based virtual machines take less time for larger programs to compile.

  • 3. In JVM bytecode, local variables are placed in the local variable table and then pushed onto the stack for manipulation, although the JVM can store local variables in the variable table without explicitly using the stack.

  • 4. In Dalvik bytecode, local variables are assigned to any of the 65536 available registers, which the Dalvik instruction directly operates on, rather than accessing elements in the stack.

1.2 How does Dalvik run Java

  • VM bytecode consists of. Class files, one class for each file.

  • The JVM loads bytecode for each class at runtime. In contrast, the Dalvik program contains only a.dex file, which contains all the classes in the program.

  • After the Java compiler establishes JVM bytecode, Dalvik’s dx(D8) compiler deletes the. Class files, recompiles them into Dalvik bytecode, and writes them into a. Dex file. This process includes translation, refactoring, and interpreting the basic elements of the program (constant pool, class definition, data segment).

  • The constant pool describes all constants, including references, method names, numerical constants, and so on. The class definition includes basic information such as access flags and class names. The data section contains the code for the various functions executed by the VM, as well as information about the classes and functions (such as the number of registers required by the DVM, the local variable table, and the size of the operation stack), as well as instance variables.

1.3 dex file

A class file is a class file generated from a Java source file, while Android merges all the class files and optimizes them to produce a final class.dex file. Dex files remove redundant information (such as repeated character constants) from class files and have a more compact structure. Therefore, during dex parsing, I/O operations can be reduced and class lookup speed can be improved.

In fact, the DEX file will be further optimized as Odex (ODEX) during the App installation process, which will be mentioned in the subsequent installation process.

Note: This optimization process comes with some side effects, the most classic being the Android 65535 problem.

65535

65535 represents the number of methods, attributes, and classes in the dex file. If the number of variables declared in a Java file or the number of classes created exceeds 65535, the compiler will also fail. Android provides MultiDex to solve this problem. Many online articles say that 65535 problem is because of parsing dex file to the data structure DexFile, using short to store the number of methods, in fact, this statement is wrong!

Android 65535 problem resolved

2. Android Runtime (ART)

Android Runtime (ART) is the default Runtime for devices running Android 5.0 (API level 21) and later. This runtime provides a number of features to improve the performance and smoothness of the Android platform and applications.

ART is a managed runtime used by applications and some system services on Android. ART and its predecessor Dalvik were originally built specifically for Android projects. Dalvik executables as ART executables at runtime and follow the Dex bytecode specification.

ART and Dalvik are compatible runtimes running Dex bytecode, so applications developed for Dalvik can also run in ART. However, some of the techniques used by Dalvik are not applicable to ART.

2.1 ART function

2.1.1 AOT Compilation in advance

ART introduces a pre-compilation mechanism to improve application performance. ART also has more stringent installation-time validation than Dalvik.

At installation time, ART uses the dex2OAT tool that comes with the device to compile the application. This utility takes a DEX file as input and generates a compiled application executable for the target device. The tool should be able to compile all valid DEX files smoothly. However, some post-processing tools generate invalid files that are acceptable to Dalvik but cannot be compiled by ART.

2.1.2 Optimization of garbage collection

Garbage collection (GC) can be very resource-intensive, which can hurt application performance, resulting in unstable displays, slow interface responsiveness, and other issues. ART has optimized garbage collection in several ways:

  • Most use concurrent design, with one GC pause;
  • Concurrent replication to reduce background memory usage and fragmentation;
  • The duration of GC pauses is not affected by heap size;
  • In the special case of cleaning up recently allocated short-duration objects, the collector’s total GC time is shorter;
  • Optimized garbage collection ergonomics for more timely parallel garbage collection, making GC_FOR_ALLOC events extremely rare in typical use cases.

2.1.3 Optimization of development and debugging

ART provides a number of features to optimize application development and debugging.

2.1.3.1 Sampling analyzer is supported

Historically, developers have used the Traceview tool, which tracks application execution, as a profiler. While Traceview can provide useful information, the overhead of each method call can skew the results of Dalvik analysis, and the use of the tool clearly affects runtime performance.

ART adds support for dedicated sampling analyzers without these limitations, providing a more accurate picture of application execution without significantly slowing it down. The KitKat version adds sampling support to Dalvik’s Traceview.

Traceview: Traceview is deprecated. If you are using Android Studio 3.2 or later, use the CPU Performance Profiler instead to do the following: Check.trace files captured by detecting applications using the Debug class, record new method trace records, save.trace files, and check real-time CPU usage of application processes.

2.1.3.2 Support more debugging functions

ART supports many new debugging options, especially those related to monitoring and garbage collection. For example, you can:

  • See which locks are held in the stack trace, and then jump to the thread that holds the lock.
  • Asks for the number of currently active instances of a given class, requests to see instances, and references to keep objects in a valid state.
  • Filter instance-specific events (such as breakpoints).
  • View the value returned when the method exits (using the “method-exit” event).
  • Sets field watch points to pause program execution while accessing and/or modifying specific fields.

2.1.3.3 Optimized diagnostic details in exception and crash reports

When a runtime exception occurs, ART gives you as much context and detail as possible. ART will provide Java. Lang. ClassCastException, Java. Lang. ClassNotFoundException and Java lang. NullPointerException abnormal more detailed information. (higher version of Dalvik will provide Java. Lang. ArrayIndexOutOfBoundsException and Java lang. Abnormal ArrayStoreException more detailed information, the information including the array size and now crossing the line offset; ART also provides this information.)

ART also provides more useful context information in application native code crash reports by incorporating Java and native stack information.

2.2 ART feature improvements in Android 8.0

Android Runtime (ART) has been greatly improved in Android 8.0. The following list summarizes the enhancements available to device manufacturers in ART.

2.2.1 Concurrent compressed garbage collector

As Google announced at Google I/O, ART offers a new concurrent compressed garbage collector (GC) in Android 8.0. The collector compresses the heap each time GC is performed and while the application is running, pausing only briefly once while processing the thread root. The recycling appliance has the following advantages:

  • GC always compacts the heap: the heap size is 32% smaller on average than in Android 7.0.
  • Thanks to compression, thread-local collision pointer object allocation is now possible: 70% faster than in Android 7.0.
  • The H2 benchmark pauses 85% less than Android 7.0 GC.
  • The number of pauses no longer varies with the size of the heap, and applications can use large heaps without worrying about stashing.
  • GC implementation details – Read barriers:
    • Read barriers are a small amount of work done to read each object field.
    • They are optimized in the compiler, but may slow down some use cases.

2.2.2 Cycle optimization

In Android 8.0, ART has implemented a variety of circular optimizations, as follows:

  • Eliminating boundary checking
    • Static: Prove that the scope is within the bounds at compile time
    • Dynamic: Runtime tests ensure loops are always within bounds (otherwise no optimizations are made)
  • Elimination of induction variable
    • Remove useless induction
    • Replace an induction that is used only after a loop with a closed expression
  • Eliminate useless code in the body of the loop and remove the entire loop
  • The intensity of lower
  • Cyclic conversion: reverse, swap, split, expansion, single mode, etc
  • SIMDization (also known as vectorization)

The loop optimizer is located in a separate optimization section of the ART compiler. Most circular optimizations are similar to other aspects of optimization and simplification. Some optimizations to override CFG can be challenging in a more complex way than usual, because most CFG utilities (see Nodes.h) focus on building rather than rewriting CFG.

2.2.3 Analysis of class hierarchy

In Android 8.0, ART uses Class Hierarchy Analysis (CHA), a compiler optimization that devirtualizes virtual calls into direct calls based on the results of class hierarchy analysis. Virtual calls are expensive because they are implemented around vtable lookups and take up several dependent loads. In addition, virtual calls cannot be embedded.

Here’s a summary of the enhancements:

  • Dynamic single implementation method status update – At the end of class association time, if the Vtable has been populated, ART compares the vtables of the superclass by item.
  • Compiler optimization – The compiler takes advantage of a method’s single implementation information. If method A.foo sets a single implementation flag, then the compiler devirtualizes the virtual call into a direct call, which further attempts to embed the direct call.
  • Compiled code is invalid – Also, at the end of class association time, if the single implementation information has been updated and method A.foo previously had a single implementation, but that status has now become invalid, then all compiled code that relied on the assumption that method A.foo had a single implementation needs to become invalid.
  • De-optimize – For valid code compiled on the stack, de-optimize is enabled to force invalid compiled code into interpreter mode to ensure correctness. The system will adopt a new de-optimization mechanism combining synchronous and asynchronous de-optimization.

2.2.4. embedded cache in OAT files

ART now uses embedded caching and optimizes calling sites where there is enough data available. Embedded caching features record additional runtime information to the configuration file and use this information to add dynamic optimizations to the precompilation.

2.2.5 Dexlayout

Dexlayout is a library introduced in Android 8.0 for analyzing dex files and reordering them according to their configuration files. Dexlayout is designed to use runtime configuration information to reorder parts of the DEX file during idle maintenance compilation on the device. By centralizing parts of the DEX files that are often accessed together, programs can have better memory access patterns due to improved file locations, saving RAM and shortening startup time.

Since configuration file information is currently only available after running the application, the system integrates DexLayout into device compilation of Dex2OAT during idle maintenance.

2.2.6 Dex Cache Removed

In Android 7.0 and later, DexCache objects have four large arrays proportional to the number of specific elements in DexFile, namely:

  • String (one reference per DexFile::StringId),
  • Type (one reference per DexFile::TypeId),
  • Method (a native pointer for each DexFile::MethodId),
  • Field (a native pointer for each DexFile::FieldId).

These arrays are used to quickly retrieve objects that we parsed previously. In Android 8.0, all arrays except method arrays have been removed.

2.2.7 Interpreter performance

In Android version 7.0, interpreter performance was significantly improved by the introduction of MTERP, an interpreter with a core extract/decode/interpret mechanism written in assembly language. Mterp mimics the fast Dalvik interpreter and supports ARM, ARM64, x86, X86_64, MIPS, and MIPS64. ART’s Mterp is roughly equivalent to Dalvik’s fast interpreter for computational code. Sometimes, though, it can slow down dramatically, or even dramatically:

  • Call performance.
  • String operations and other high-frequency user methods in Dalvik that are considered embedded functions.
  • Stack memory usage is high.

Android 8.0 addresses these issues.

2.2.8 Learn more about embedding

Starting with Android 6.0, ART can embed any call from the same dex file, but only leaf methods from other DEX files. There are two reasons for this restriction:

  • Embedding from another dex file requires the dex cache of that dex file to be used, as opposed to the embedding of the same DEX file, which simply reuses the dex cache of the caller. A dex cache is required in compiled code to perform a series of instructions, such as static calls, string loading, or class loading.
  • The stack map encodes only the method index in the current DEX file.

To address these limitations, Android 8.0 has made the following improvements:

  • Remove DEX cache access from compiled code (see also the “Dex Cache Removal” section)
  • Extend stack mapping encoding.

2.2.9 Improvements in synchronization

The ART team tweaked the MonitorEnter/MonitorExit code path and reduced our reliance on the traditional memory barrier on ARMv8, replacing it with newer (get/release) instructions whenever possible.

2.2.10 Faster native methods

Using the @FastNative and @CriticalNative annotations makes native calls to the Java Native interface (JNI) much faster. These built-in ART runtime optimizations speed up JNI conversions and replace the now-deprecated! Bang JNI tag. These annotations have no impact on non-native methods and only apply to platform Java language code on Bootclasspath (no Play Store update).

The @Fastnative annotation supports non-static methods. Use this annotation if a method accesses Jobject as a parameter or return value.

Using the @CriticalNative annotation, native methods can be run more quickly, but with the following limitations:

  • Methods must be static methods – objects with no arguments, return values, or implicit this.
  • Only primitive types are passed to native methods.
  • Native methods do not use JNIEnv and jclass parameters in their function definitions.
  • Methods must use RegisterNatives registration instead of relying on dynamic JNI links.

The @FastNative and @CriticalNative annotations disable garbage collection when executing native methods. Do not use with long-running methods, including methods that are usually fast but generally unrestricted.

Pausing garbage collection can cause a deadlock. If the lock has not been released locally (that is, not returned to managed code), do not acquire the lock during the native quick call. This requirement does not apply to regular JNI calls, because ART treats executing native code as a suspended state.

@fastNative can improve the performance of native methods by up to 3 times, and @criticalNative can improve the performance of native methods by up to 5 times.

More details: Official: Android Runtime (ART) and Dalvik

3. Memory management

The Android Runtime (ART) and Dalvik virtual machines use paging and memory mapping to manage memory. This means that any memory that applies changes, whether by allocating new objects or tapping a memory-mapped page, stays in RAM and cannot be swapped out. To free memory from an application, only object references retained by the application can be freed, making the memory available to the garbage collector. There is one exception to this: any unmodified memory-mapped file (such as code) can be swapped out of RAM if the system wants to use its memory elsewhere.

3.1 ART GC Overview

ART has several different GC schemes that involve running different garbage collectors. Starting with Android 8 (Oreo), the default is concurrent replication (CC). Another GC scheme is concurrent mark clearing (CMS).

Some of the main features of concurrent replication GC include:

  • CC supports the use of a touch pointer allocator named “RegionTLAB”. This allocator can assign a thread-local allocation buffer (TLAB) to each application thread so that the application thread can assign objects from its TLAB by touching the “top of the stack” pointer without any synchronization operation.
  • CC performs heap defragmentation by copying objects concurrently without suspending the application thread. This is done with the help of a read barrier that intercepts reference reads from the heap without any intervention by the application developer.
  • The GC has only one short pause, which is constant in time for the heap size.
  • In Android 10 and later, CC is extended to generational GC. It enables easy recycling of objects with short durations, which often become inaccessible quickly. This helps improve GC throughput and significantly delays the need to perform full heap GC.

Another GC scheme that ART still supports is the CMS. This GC scheme also supports compression, but not in a concurrent manner. It avoids performing compression until the application is in the background, after which it pauses the application thread to perform compression. If object allocation fails due to fragmentation, compression must also be performed. In this case, the application may be unresponsive for some time.

Because the CMS rarely compresses, free objects can be discontinuous. CMS uses a free list-based allocator called RosAlloc. Compared with RegionTLAB, the distribution cost of this allocator is higher. Finally, the CMS memory usage of the Java heap can be higher than the CC memory usage due to internal fragmentation.

CMS: Java Garbage Collection (GC)

3.2 Garbage Collection

An in-tube storage environment such as the ART or Dalvik virtual machine tracks each memory allocation. Once the program determines that it is no longer using a block of memory, it frees that memory back into the heap without any programmer intervention. This mechanism for reclaiming unused memory from the in-storage environment is called garbage collection. Garbage collection has two goals:

  • Look for a number of objects in the program that will not be accessible in the future
  • Reclaim the resources used by these objects.

Android’s memory heap is generational, which means it keeps track of different allocated storage partitions based on the life expectancy and size of allocated objects.

See: Java Garbage Collection (GC)

3.3 Shared Memory

To accommodate everything you need in RAM, Android tries to share RAM pages across processes. It can do this in the following ways:

  • Each application process forks from an existing process named Zygote. See: Source code Interpretation – How does the application start
  • Most static data is memory-mapped to a process. This approach allows data not only to be shared between processes, but also swapped out as needed. Examples of static data include Dalvik code, application resources, and files in lib.
  • Android uses explicitly allocated shared memory areas (via Ashmem or Gralloc) to share the same dynamic RAM between processes. For example, the window Surface uses memory shared between the application and the screen synthesizer, while the cursor buffer uses memory shared between the content provider and the client.

Due to the widespread use of shared memory, care needs to be taken when determining the amount of memory an application uses.

3.4 Allocating and Reclaiming Application Memory

The Dalvik heap is limited to a single virtual memory range for each application process. This defines the logical heap size, which can grow as needed but cannot exceed the upper limit defined by the system for each application.

The logical size of the heap is different from the amount of physical memory the heap uses. When examining the application heap, Android calculates a proportional memory size (PSS) value that takes into account both dirty and clean pages shared with other processes, but is proportional to the number of applications sharing that RAM. This (PSS) total is what the system considers to be the physical memory footprint.

The Dalvik heap does not compress the logical size of the heap, which means Android does not defragment the heap to reduce space. Android can reduce the logical heap only if there is unused space at the end of the heap. However, the system can still reduce the physical memory used by the heap. After garbage collection, Dalvik traverses the heap and looks for unused pages, which are then returned to the kernel using Madvise. Therefore, pairing allocation and de-allocation of large data blocks should cause all (or nearly all) of the physical memory used to be reclaimed. However, it is much less efficient to reclaim memory from smaller allocations, because pages used for smaller allocations may still be shared with other chunks of data that have not yet been freed.

3.5 Limiting the Application Memory

To maintain a multitasking environment, Android sets a hard cap on the heap size of each application. The exact upper limit of heap size for different devices depends on the total available RAM size of the device. If your application tries to allocate more memory after reaching the heap capacity limit, you may receive an OutOfMemoryError.

In some cases, you can query the system for the exact amount of heap space available by calling getMemoryClass().

3.6 Switching Applications

Android keeps non-foreground applications in the cache when the user switches between applications. A non-foreground application is an application that the user does not see or run foreground services (such as music playing).

For example, when a user starts an application for the first time, the system creates a process for the application. However, the process does not exit when the user leaves the application. The system keeps the process in the cache. If the user returns to the application later, the process is reused, speeding up application switching.

If your application has cached processes and retains resources that are not currently needed, it can affect overall system performance even if users are not using your application. When system resources, such as memory, run low, it terminates the process in the cache. The system also considers terminating the processes that occupy the most memory to free up RAM.