The background,

After the latest version is released online, many users (mostly Huawei users) report that the version crashes. The logs show that pthread_CREATE (1040KB stack) failed: XXX.

java.lang.OutOfMemoryError
pthread_create (1040KB stack) failed: Out of memory
1 java.lang.Thread.nativeCreate(Native Method)
2 java.lang.Thread.start(Thread.java:743)
3 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:941)
4 java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1009)
5 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
6 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
7 java.lang.Thread.run(Thread.java:774)
Copy the code
java.lang.OutOfMemoryError
pthread_create (1040KB stack) failed: Try again
1 java.lang.Thread.nativeCreate(Native Method)
2 java.lang.Thread.start(Thread.java:733)
3 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:975)
4 java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1043)
5 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1185)
6 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
7 java.lang.Thread.run(Thread.java:764)
Copy the code

Ii. Problem analysis

2.1 Preliminary Inference

Android memory management strategy

OOM does not equal insufficient RAM, it has to do with Android’s memory management strategy.

As we know, memory is divided into virtual addresses and physical addresses. Memory allocated by malloc or new is memory in the virtual address space. Virtual address space is much larger than physical address space. What if there are not enough physical address Spaces when more processes are running at the same time?

Linux uses a “process memory maximization” allocation strategy, using the Swap mechanism to ensure that physical memory is not used up, and the least recently used space is allocated to external storage, pretending to be stored in RAM.

Although Android is based on Linux, it has its own memory strategy — no swap.

The Process allocation policy of Android is that each process has a memory limit, which is determined by the specific configuration of the phone. The purpose is to keep more processes in RAM, so that each process can avoid the consumption of data read and write from external storage to internal storage when aroused, speed up the response of more App recovery, and prevent rogue apps from grabbing all memory. As a result, Android adopted its own LowMemoryKill strategy to control processes in RAM. If RAM is really low, MemoryKiller will kill some lower-priority processes to free up physical memory.

Therefore, when the OOM is triggered, the virtual memory address space used exceeds the allocated threshold.

How much memory does Android allocate per app? This varies from phone to phone. For example, on the test machine at hand, the maximum memory allocated by the system is 192 M; When largeHeap is set, a maximum of 512 MB can be applied.

2.2 Code Analysis

So how does this overflow get thrown by the system? Runtime /thread.cc: runtime/thread.cc: runtime/thread.cc: runtime/thread.cc: runtime/thread.cc: runtime/thread.cc: runtime/thread.cc

void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon)
Copy the code

There are two key steps in thread creation:

  • Create thread-private structure JNIENV in the first column (JNI execution environment for C layer calling Java layer code)
  • The second column calls the POSIX C library function pthread_create for thread creation

Both of these steps are likely to throw an OOM, and the basic position is that the creation thread caused the OOM.

This article analyzes the principle of creating threads, in fact, is to call MMAP to allocate stack memory (virtual memory), and then through Linux MMAP call mapping to user virtual memory address space. An OOM occurs during thread creation because the virtual memory address space in the process is exhausted.

So when does the virtual memory address space run out?

Direction 1: Excessive FD

In Linux everything is a file, the network is a file, opening a file, creating a TCP connection is also a file, all take up fd. Fd is a resource, and there are limits to being a resource. There is a maximum number of open files per process.

The timing of the increase in FD is:

  • Example Create a socket connection
  • Open the file
  • Create a HandlerThread
  • Create NIO Channel (one fd for each read and write)
  • Run the ls -l /proc//fd/ command to check which files a process has opened
  • cat /proc/<pid>/limitsCommand to view fd limits for a process, or other limits such as Max Open Files
  • lsof -p <pid> |wc -lView the total number of FDS for a process

As shown in the figure above, Max Open Files represents the maximum number of open files per process. Each open file of the process generates a file descriptor fd(recorded in /proc/pid/fd).

Verification is also simple by triggering a large number of network connections or file openings, adding a FD for each socket opened.

private Runnable increaseFDRunnable = new Runnable() {
      @Override
      public void run(a) {
          try {
              for (int i = 0; i < 1000; i++) {
                  new BufferedReader(new FileReader("/proc/" + Process.myPid() + "/status"));
              }
              Thread.sleep(Long.MAX_VALUE);
          } catch (InterruptedException e) {
              //
          } catch (FileNotFoundException e) {
              //}}};Copy the code
Direction two: Too many threads

You can view vmpeak/vmsize in /proc//status for used logical space addresses

There are two possible causes: 1. The stack memory of the process exceeds the maximum memory of the VM. 2. The number of threads reaches the maximum limit of the system;

Screening tool

  • Profilter CPU views the current list of all threads

  • Is using CPU profilers to monitor CPU usage and thread activity usage too heavy? Can’t classify statistics? Can use the adb shell ps – p – T, you can also use | grep filters, XXX use wc -l to statistics the number of threads.

  • Dump process memory directly to check the memory status:

    adb shell dumpsys meminfo [pacakgename]
    Copy the code
  • You can also view summary data such as threads:

    adb shell
    cat /proc/19468/status
    Copy the code

In Linux, thread limits are described in /proc/sys/kernel/threads-max. You can run the cat /proc/sys/kernel/threads-max command to check the thread limits. Huawei is very strict about thread limits. The maximum thread count has been changed to 500 on 7.0+ phones.

So where is the code causing the thread outbreak? We use watch to print the current number of threads every 1s and then locate the problem through page interaction to observe which type of thread names are increasing.

watch -n 1 -d 'adb shell ps -T | grep XXX | wc -l'
Copy the code

The total number of threads was easily over 290 by the time they entered the broadcast room, and a large number of RxCachedThreadSchedule threads (also known as the Rx Scheduler. IO Scheduler) were created, and the number of IO threads jumped to 46. Stay in the live broadcast room for a period of time, the number of threads only increase, will not expire cleaning.

2.3 Verify the inference and locate the cause

Write a demo to verify, using Kotlin coroutine and RxJava IO scheduler, simulate intensive concurrent IO environment

 for (i in 0.100.) {
            GlobalScope.launch(Dispatchers.IO) {
                delay(100)
                Log.e("IOExecute"."Coroutine - Current thread:"
                      + Thread.currentThread().name)
            }
        }
Copy the code

  for (i in 0.100.) {
            ThreadExecutor.IO.execute {
                Thread.sleep(100)
                Log.e("IOExecute"."RxJava IO - Current thread:"
                      + Thread.currentThread().name)
            }
        }
Copy the code

It seems a little strange that I/O threads are not being reused. We know that the Rx scheduler is actually a encapsulated thread pool, and we are familiar with the flow of thread pools. The diagram below:

Is the work queue full? Is there an unlimited number of threads? Is there something wrong with the saturation strategy?

suspects

  1. Initial direct broadcast room, dense IO, no reuse, thread growth

  2. The IO thread is not destroyed if it stays beyond keepAliveTime

The source code for

So what went wrong? In the spirit of excavator major graduation, let’s take a look at the source location reason of Scheduler. IO. Before looking at the source code, we first put forward questions and assumptions, so as not to get lost when looking at the source code with problems:

doubt

  • How is the work queue managed and what is its capacity?
  • What is the thread pool policy? When do I create a new thread? When will it be destroyed?

Let’s take a look at the RxJava thread model diagram to clarify the relationship between classes: Scheduler is the RxJava thread task Scheduler, Worker is the specific implementer of thread task. Different Scheduler classes will have different Worker implementations, because Scheduler classes are ultimately handed to workers to perform scheduling.

IO () uses a static inner class to create a singleton IoScheduler object that inherits from the Scheduler.

@NonNull
static final Scheduler IO;

@NonNull
public static Scheduler io(a) {
    //1. Returns a Scheduler object named IO directly
    return RxJavaPlugins.onIoScheduler(IO);
}

static {
    // omit extraneous code
    
    //2. The IO object is instantiated in a static code block, where an IOTask() is created
    IO = RxJavaPlugins.initIoScheduler(new IOTask());
}

static final class IOTask implements Callable<Scheduler> {
    @Override
    public Scheduler call(a) throws Exception {
        //3.IOTask returns an IoHolder object
        returnIoHolder.DEFAULT; }}static final class IoHolder {
    //4. The IoHolder will be a new IoScheduler object
    static final Scheduler DEFAULT = new IoScheduler();
}
Copy the code

IoScheduler the parent of the Scheduler in scheduleDirect (), schedulePeriodicallyDirect () method to create a Worker, The worker’s schedule() and schedulePeriodically() are then called to perform the task.

public abstract class Scheduler {
    
    // Retrieves or creates a new {@link scheduler.worker} that represents the serial execution of operations. When work is complete, you should unsubscribe using {@link scheduler.worker # dispose ()}. Return a Worker, which represents a sequence of actions to be performed.
    @NonNull
    public abstract Worker createWorker(a);

    @NonNull
    public Disposable scheduleDirect(@NonNull Runnable run, long delay, @NonNull TimeUnit unit) {
        final Worker w = createWorker();

        final Runnable decoratedRun = RxJavaPlugins.onSchedule(run);

        DisposeTask task = new DisposeTask(decoratedRun, w);

        w.schedule(task, delay, unit);

        return task;
    }

    @NonNull
    public Disposable schedulePeriodicallyDirect(@NonNull Runnable run, long initialDelay, long period, @NonNull TimeUnit unit) {
        final Worker w = createWorker();
        // omit extraneous code
        Disposable d = w.schedulePeriodically(periodicTask, initialDelay, period, unit);
        // omit extraneous code}}Copy the code

As mentioned above, different Scheduler classes have different Worker implementations. Let’s take a look at the corresponding Worker of IoScheduler:

final AtomicReference<CachedWorkerPool> pool;

public Worker createWorker(a) {
    // Create a new EventLoopWorker and pass a CachedWorkerPool object (Worker cache pool)
    return new EventLoopWorker(pool.get());
}

static final class EventLoopWorker extends Scheduler.Worker {
    private final CompositeDisposable tasks;
    private final CachedWorkerPool pool;
    private final ThreadWorker threadWorker;

    final AtomicBoolean once = new AtomicBoolean();
    
    // constructor
    EventLoopWorker(CachedWorkerPool pool) {
        this.pool = pool;
        this.tasks = new CompositeDisposable();
        // Retrieve a Worker from the cache Worker pool
        this.threadWorker = pool.get();
    }

    @NonNull
    @Override
    public Disposable schedule(@NonNull Runnable action, long delayTime, @NonNull TimeUnit unit) {
        // omit extraneous code
        
        // The Runnable is given to threadWorker to execute
        returnthreadWorker.scheduleActual(action, delayTime, unit, tasks); }}Copy the code

Next is the Worker cache pool operation:

CachedWorkerPool the get ()
static final class CachedWorkerPool implements Runnable {
    ThreadWorker get(a) {
        if (allWorkers.isDisposed()) {
            return SHUTDOWN_THREAD_WORKER;
        }
        while(! expiringWorkerQueue.isEmpty()) {// If the buffer pool is not empty, threadWorker is fetched from the buffer pool
            ThreadWorker threadWorker = expiringWorkerQueue.poll();
            if(threadWorker ! =null) {
                returnthreadWorker; }}// If the buffer pool is empty, create one and return it.
        ThreadWorker w = new ThreadWorker(threadFactory);
        allWorkers.add(w);
        returnw; }}Copy the code

So what does ThreadWorker do? Go to the parent class NewThreadWorker:

Constructor of NewThreadWorker
public class NewThreadWorker extends Scheduler.Worker implements Disposable {
    private final ScheduledExecutorService executor;
		volatile boolean disposed;

		public NewThreadWorker(ThreadFactory threadFactory) {
    		The ScheduledExecutorService constructor creates a ScheduledExecutorService object that can be used to use the thread poolexecutor = SchedulerPoolFactory.create(threadFactory); }}Copy the code
SchedulerPoolFactory.create
public final class SchedulerPoolFactory {
		/**
     * Creates a ScheduledExecutorService with the given factory.
     * @param factory the thread factory
     * @return the ScheduledExecutorService
     */
    public static ScheduledExecutorService create(ThreadFactory factory) {
        // Thread created here!!
        final ScheduledExecutorService exec = Executors.newScheduledThreadPool(1, factory);
        if (PURGE_ENABLED && exec instanceof ScheduledThreadPoolExecutor) {
            ScheduledThreadPoolExecutor e = (ScheduledThreadPoolExecutor) exec;
            POOLS.put(e, exec);
        }
        returnexec; }}Copy the code

Therefore, IoScheduler uses CachedWorkerPool as a thread pool, which maintains a blocking queue to record all available threads. When a new task is required, the thread pool checks whether there are any available threads in the blocking queue and creates a new one.

If we want to know why the thread surge is not reused, we need to look at when all the idle threads used are recycled back into the blocking queue.

CachedWorkerPool release ()
void release(ThreadWorker threadWorker) {
            // Refresh expire time before putting worker back in pool
  					// Refresh the expiration time of the thread and put the completed Worker into the cache pool
            threadWorker.setExpirationTime(now() + keepAliveTime);
            expiringWorkerQueue.offer(threadWorker);
        }
Copy the code

There is only one way to call this code:

    @Override
        public void dispose(a) {
            if (once.compareAndSet(false.true)) { tasks.dispose(); pool.release(threadWorker); }}Copy the code

This call can be simply understood as maintaining a state list inside the thread. Dispose will be called to unsubscribe and release the occupation of the thread after the task in the thread is completed.

And when will it be destroyed? You can see that the CachedWorkerPool constructor creates a scheduled cleanup task:

    static final class CachedWorkerPool implements Runnable {
        CachedWorkerPool(long keepAliveTime, TimeUnit unit, ThreadFactory threadFactory) {
            / /...
          	// Create a thread that executes every 60 seconds by default to clear expired threads
                evictor = Executors.newScheduledThreadPool(1, EVICTOR_THREAD_FACTORY);
           // Set a scheduled task
                task = evictor.scheduleWithFixedDelay(this.this.keepAliveTime, this.keepAliveTime, TimeUnit.NANOSECONDS);
            / /...
        }

        @Override
        public void run(a) { evictExpiredWorkers(); }}Copy the code
CachedWorkerPool evictExpiredWorkers ()
 void evictExpiredWorkers(a) {
            if(! expiringWorkerQueue.isEmpty()) {long currentTimestamp = now();

                for (ThreadWorker threadWorker : expiringWorkerQueue) {
                    if (threadWorker.getExpirationTime() <= currentTimestamp) {
                        if(expiringWorkerQueue.remove(threadWorker)) { allWorkers.remove(threadWorker); }}else {
                        // The queue is sorted according to the expiration time, so once we find the unexpired Worker, we can stop cleaning
                        break; }}}}Copy the code

Unlike the compute scheduler, the COMPUTE scheduler uses an array to hold a set of threads and then assigns tasks to each thread based on the index. The extra tasks are queued up for execution, so the execution of tasks after each thread has to wait for the execution of tasks before it.

The thread pool in the IO scheduler is a self-increasing, unlimited thread pool, and 60 seconds alive. That is to say: if the I/O scheduling of intensive requests exceeds the reuse threshold within 60 seconds, the scheduler will not restrict the number of threads and will keep opening new threads.

This explains why the thread of Doubt 1 jumps when entering the live broadcast room. It is because there is no task queue and a task is directly coming. If it can be reused, the Worker can be reused instead of being created.

What about doubt 2? Why are threads that stay longer than 60 seconds not collected?

We speculate:

  • Is the cleanup thread working properly?

  • Could there be a subscription leak? Where an Observable doesn’t end in time, so it’s holding up the thread?

Look at the source code can’t simulate a real production environment, that

How to do dynamic observation without changing the source code?

3 ways:

  1. Dynamic hook
  2. Insert the static pile
  3. Non-blocking breakpoint, type log

Observation point arrangement

  1. Entry to tasks – How often and who is submitting tasks?
  2. Working logic – which thread is this task assigned to?
  3. Reuse logic – new threads are triggered when full. How is reuse going? Why are so many new threads created?
  4. Release logic – Unsubscribe and compare the number of subscriptions. Is there a subscription leak?
  5. Expired cleanup logic – is the cleanup thread working properly? What is each thread doing and why does it stay in the broadcast room and not be destroyed

Ok, THE log of specific observation results will not be posted, because Zhai has a poor log reading experience, I drew a picture to summarize the whole process:

See what the problem is?

The reason why there are no threads to clean up is because none of the threads have expired. Yes, none of the previous 46 IO threads have expired. IoScheduler uses ConcurrentLinkedQueue to maintain finished workers and sort them in insert order (i.e., release order), so the earliest expired workers are given priority for new tasks.

Let’s calculate that one polling for 2s, there is only one polling protocol in the live broadcast room (actually more than one), that 60s is enough for 30 works to update the expiration time once, and n polling can update the expiration time of 60/2 * N workers.

Sure enough before the source code, no secret.

conclusion

  1. The characteristics of the business scene of direct broadcast room: when entering the room, a large number of tasks need to be parallel in a short time; Multiple polling exists;

  2. The IO scheduling strategy of RxJava is not suitable for concurrent multi-IO + polling. There is no task queuing, threads can be added, no upper limit, and threads that are about to expire are preferentially used.

  3. In addition, there is unreasonable use of Rx in the business (we blocked the entry before, so we can see directly where IO scheduling is used). For example, Timer, clock and jsBridge all use IO scheduling and nested scheduling (duplicate new Worker tasks). Not following the lifecycle to unsubscribe, etc., etc.

Third, solve

Finding the root of the problem solves half of the problem. There are three basic solutions:

  1. Optimizes improper scheduler creation releases

  2. Thread convergence, not blocking must use IO scheduling

    In fact, IO does not need to use multithreading, IO multiplexing or coroutine is more reasonable.

  3. Reduce concurrent I/O and load in blocks

Four, thinking

4.1 How can I Quickly classify and Locate threads? How do I get NativeThread?

There are several disadvantages to analyze in the previous way:

  1. Because we used the breakpoint log to get the stack, we can’t get the task information that was not created by the specified method.
  2. We cannot constrain the developer partner and the SDK of the three parties to create a custom name for each thread, and we cannot quickly classify threads. For example, thread-1 makes it difficult for us to locate which class initiated the call.
  3. We can only get the total number of native threads connected to the Java layer, and cannot get the native threads that are not attached to the Java layer, that is, threads directly created in the Native layer. Such as native threads in Futter Engine.

What’s going on here?

1. Modify ASM bytecode

The idea is simple. If you want to create a thread, you must do so in one of the following ways:

  • ThreadAnd its subclasses
  • TheadPoolExecutorAnd its subcategories,Executors,ThreadFactoryThe implementation class
  • AsyncTask
  • TimerAnd its subclasses

Booster, the open source library of Didi Team, uses ASM to modify bytecode, replacing all instructions for creating threads with custom method calls during compilation, and prefixes the thread name with the class name of the caller to track the source of thread creation.

In addition to thread renaming, you can replace the method invocation of Executors with the optimization method corresponding to ShadowExecutors to achieve global violence convergence.

  • The number of threads in a pool is defined as the number of threads in a pool. The number of threads in a pool is defined as the number of threads in a pool.

  • The Goodong team’s thread-monitoring tool does the same

Note: If Booster is used, do as many tests and downgrades as possible. Such as: ShadowExecutors. NewOptimizedFixedThreadPool method were used in the LinkedBlockingQueue queue, do not specify a queue size, the default is an Integer. MAX_VALUE, The unbounded LinkedBlockingQueue acts as a blocking queue, which can cause a large number of new tasks to pile up in the queue when the task takes a long time, causing CPU and memory to surge, and eventually OOM.

2. NativeHook

Oracle ASM byte code (ASM) is used to connect to native threads in the Java layer. Hey, didn’t we see the C++ code for thread creation earlier? The basic idea is to find the function associated with pthread_create and intercept it.

Step 1: Look for Hook points

This requires some understanding of the thread startup process, as shown in this article on the Android thread creation process

java_lang_Thread.cc:Thread_nativeCreate

static void Thread_nativeCreate(JNIEnv* env, jclass, jobject java_thread, jlong stack_size, jboolean daemon) {
  Thread::CreateNativeThread(env, java_thread, stack_size, daemon == JNI_TRUE);
}
Copy the code

The CreateNativeThread function in thread.cc

void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {... pthread_create_result =pthread_create(&new_pthread, &attr, Thread::CreateCallback, child_thread); . }Copy the code
Step 2: Find So of Hook

In which library are the Thread_nativeCreate, CreateNativeThread, and pthread_create functions compiled?

Very simple, let’s look at compiling the script android.bp.

art_cc_library {
   name: "libart",
   defaults: ["libart_defaults"],
}

cc_defaults {
   name: "libart_defaults",
   defaults: ["art_defaults"],
   host_supported: true,
   srcs: [
    thread.cc",
   ]
}
Copy the code

You can see it’s in “libart.so”.

Step 3: Find the symbol of the Hook function

The C++ function Name will be Name Mangling, so we need to look at the export notation.

readelf -a libart.so
Copy the code

The pthread_create function is indeed in libc.so, and since c compilers do not need to be deMangling

001048a0  0007fc16 R_ARM_JUMP_SLOT   00000000   pthread_create@LIBC
Copy the code
Step 4: Implementation

In consideration of performance, we only hook the specified SO.

hook_plt_method("libart.so", "pthread_create", (hook_func) &pthread_create_hook);
Copy the code

If you want to monitor pthread_create for other so libraries, you can add your own. One approach in Facebook’s Profilo is to hook all so’s that have been loaded so far.

For pthread_create arguments, just look at pthread.h.

int pthread_create(pthread_t* __pthread_ptr, pthread_attr_t const* __attr, void* (*__start_routine)(void*), void*);
Copy the code

Fetching the stack is the way to reflect Java at home

jstring java_stack = static_cast<jstring>(jniEnv->CallStaticObjectMethod(kJavaClass, kMethodGetStack));
Copy the code

Profilo: Facebook’s performance analytics tool

Epic: This library already supports intercepting the Run method of Thread class and all subclasses of Thread class. Further, we can combine Systrace and other tools to generate the execution flow chart of the whole process.

Note: For hooks on apps, don’t rely on reflection and dynamic proxies as before, look at The Lancet, Epic, really do whatever you want.

4.2 Identify pain points

It’s too bad we can’t keep a hand, we have to guess, we have to reproduce.

Caton, crash all require “on-the-spot information”. Because bugs also depend on many factors, such as the user’s system version, CPU load, network environment, application data, thread count, utilization, all thread stacks at the time of the crash, not just the thread stack that crashed…

From this scene, it is difficult for us to reproduce locally, and it is difficult to solve the problem. So how do we monitor the line and retain enough on-the-spot information to help us troubleshoot problems?

Here we can either develop our own crash collection system or plug into existing solutions

  • get.fabric.io
  • koom

4.3 Nature of asynchrony? What do coroutines, NIO, Fiber, Loom solve?

Back to the basics, why do we need multiple threads? Is multithreading really necessary?

Because the sequential code structure is blocking, the execution of each line of code causes the thread to block, which means that all time-consuming operations cannot be performed in the main thread, so multithreading is required.

So the purpose is non-blocking and the mode is asynchronous.

However, many asynchronous libraries have been introduced, the root cause of which is the inadequacy of the current thread implementation, not the fact that asynchronous code is better. We shouldn’t take it for granted that asynchrony is normal. It’s actually a Java design problem that has kept us suffering until now with asynchrony: callback hell, inconvenient debugging analysis…

For a long time, Java threads correspond to operating system threads one by one, which directly restricts the improvement of concurrency capability of Java platform: task blocking means thread blocking, thread state switching brings overhead, blocked threads waste system resources…… From the Quasar Project, the Coroutine feature of the Alibaba JDK, to the Kotlin coroutine and The OpenJDK’s Project Loom, the Java community has increasingly realized that: The current Java threading model is increasingly difficult to meet the needs of the entire industry for highly concurrent application development.

There are many solutions, one of which is the language layer:

The representative — coroutine, although in different languages, the implementation method of coroutine is different, but the essence is the same, is a task encapsulation idea: scheduling tasks instead of scheduling threads, so as to reduce thread blocking, with as few threads to execute as many tasks as possible.

For example, Kotlin coroutines, because Kotlin relies on the JVM to run, cannot be supported at the bottom. At the same time, Kotlin is a programming language that needs to support coroutines at the language level, not at the language level as frameworks do. Therefore, the core part of the Kotlin-JVM coroutine is that in the compiler, the various Callback techniques are used to achieve what looks like synchronous code, which is still asynchronous in nature, calling various blocking apis with no solution, such as thread suspension in synchronized and Native methods, The blocking thread will still block.

In order for Java concurrency to improve on a larger scale, it is necessary to improve from the bottom up. This is where Project Loom, another genre, the JVM layer, started.

The Loom project proposal

Examples are Project Loom and AJDK (Alibaba JDK), taking inspiration from Erlang and Go, and starting at the JVM level, blocking everything that used to block “fiber”, “lightweight” or “virtual” threads. This has the advantage of being a more complete solution to the problem, without needing to rely on async/await syntactic candy, and having support at the JVM and library level benefits other languages across the ENTIRE JVM ecosystem.

However, the disadvantages or difficulties are also obvious, that is how to compatibility with the existing code, this transformation means that many native methods will also be changed, probably until the PREVIEW version of JDK20, let’s look at our Android currently only support Java8, EMMM, Use kotlin-JVM coroutines soon.

Let’s see how the Kotlin Coroutine reacts to this and how it needs to be adjusted if the LATER JVM development team completes the project. In the best case, the Kotlin Coroutine will simply map to “fibers” in the future. It’s actually quite interesting, the community discussion about asynchrony, and the design rethinking of Java and JVM. Interested partners can go to research.

Reference:

  • The incredible OOM
  • android java process stack OOM
  • 【Android】OOM problem analysis
  • Booster thread pool optimization
  • One thread OOM Check See precautions for thread usage
  • Probe: Indicates an OOM fault locating component on the Android line

I’m FeelsChaotic, a programmer who can write code, cut video and draw pictures. I’m committed to the pursuit of code elegance, architecture design and T-shaped growth.

Feel free to follow FeelsChaotic’s short books and nuggets, and if my articles are even remotely helpful to you, please feel free to ❤️! Your encouragement is my biggest motivation to write!

The most important, please give your suggestions or opinions, there are mistakes please correct!