I. Background introduction

There was an OOM problem in the development project. After checking the reported information on the Crash platform, it was found that this error was reported on many pages, but the same error was the following error. The stack of APP could not be seen, only the stack information with the thread pool creation problem could be seen, as shown below:

java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
    at java.lang.Thread.nativeCreate(Native Method)
    at java.lang.Thread.start(Thread.java:883)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:975)
    at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1043)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1185)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
    at java.lang.Thread.run(Thread.java:919)
Copy the code

From the reported information, we can see that the number of threads in the end is 1456, which is far beyond our normal usage. From the stack information, we can know that it is OOM caused by too many threads created in the APP. The focus of the investigation is to pay attention to the places used by the thread pool in the APP. Different pages will report this error, so this problem is not a specific page, is a connectivity problem, so you need to check the overall thread state, see where the thread has been created, have a thought to start to analyze the thread state.

Two, the investigation method

2.1 the Android CPU Profiler

Android CPU Profiler can be used to check the status of the thread pool. The profiler can be used to check the status of the thread pool. The profiler can be used to check the status of the thread pool. You can see the number and name of the corresponding threads, as well as the status of the threads, so as to further troubleshoot the problem, as shown in the figure:

But there are two problems with this tool:

  • toASConnect to the current process, after successful connectionAPPPart of the operation will be more sluggish, as smooth as normal use
  • Does show the names and states of all threads, but cannot be counted by category. Let’s say I want to knowOkHttp DispatchIt’s hard to figure out how many thread names are prefixed by this;

Based on this, we can use the ps command in adb shell to also observe the current thread state

2.2 the ps command

View the current process by package name

adb shell ps | grep xxx
Copy the code

Get the pid or name of the current process to view all current threads

adb shell ps -T | grep 6661
Copy the code



This way you can see all the current threads, you can usewc -lCount the number of threads.

2.3 Starting the Troubleshooting

We do not know exactly which part of the problem leads to the increase in the number of threads, so we need a system that can print the current number of threads every 1s and then determine the problem through page interaction. We can use the watch command to complete our idea, as shown below:

watch -n 1 -d 'adb shell ps -T | grep u0_a589 | wc -l'
Copy the code

See the output thread name in the figure above, so that we can try to see the size of the thread when we operate the APP, and observe that the number of thread names is increasing

After using the APP for a long time according to the reappearance path, we found that the number of threads frequently operating can reach 1232, which is quite large and close to all the exposed problems of the crash platform. After carefully observing the output of all thread names, we found that there were many threads prefixed with OkHttp Connect and Pool -. We know that the default thread name in the thread pool is pool-, as shown below:

DefaultThreadFactory() { SecurityManager s = System.getSecurityManager(); group = (s ! = null)? s.getThreadGroup() : Thread.currentThread().getThreadGroup(); namePrefix ="pool-" +
                  poolNumber.getAndIncrement() +
                 "-thread-";
}
Copy the code

Then we know where the problem is, there’s a place where you’re constantly creating a thread pool and not reusing it so frequently that you end up with an increase in the number of threads and eventually OOM.

Since we know that the problem is caused by the creation of the thread pool, we use the thread pool in our project and many third party SDKS use the thread pool, so how can we solve the problem?

We use epic library to do hook, which can monitor the current thread creation, and print stack information in it for us to check, as shown below:

private void hookThread() {
    DexposedBridge.hookAllConstructors(Thread.class, new XC_MethodHook() { @Override protected void afterHookedMethod(MethodHookParam param) throws Throwable { super.afterHookedMethod(param); Thread thread = (Thread) param.thisObject; Class<? > clazz = thread.getClass();if(clazz ! = Thread.class) { Log.d(ThreadMethodHook.TAG,"found class extend Thread:" + clazz);
                DexposedBridge.findAndHookMethod(clazz, "run", new ThreadMethodHook());
            }
            Log.d(ThreadMethodHook.TAG, "Thread: " + thread.getName() + " class:" + thread.getClass() +  " is created.");
            Log.d(ThreadMethodHook.TAG, "Thread:" + thread.getName() + "stack:"+ Log.getStackTraceString(new Throwable())); }}); }Copy the code

2.4 Problem Confirmation

In this way, we combined the above methods and reproduced the page path exposed in the Crash information to find the problem. Finally, the problem was located in OkHttp with an SDK that created the newThread with the wrong RxJava. Let’s take a look at the code

Private static OkHttpClient newClient(Context Context){Dispatcher Dispatcher = new Dispatcher(Executors.newSingleThreadScheduledExecutor()); .return new OkHttpClient.Builder()
            .dispatcher(dispatcher)
            ...
            .build();
}
Copy the code

This OkHttpClient method recreates the client object each time

Observable. Create (new Observable.OnSubscribe<Throwable>() {@override public void call(Subscriber<? super Throwable> subscriber) { subscriber.onNext(t); } }) .subscribeOn(Schedulers.newThread()) .subscribe(new Subscriber<Throwable>() { @Override public voidonCompleted() {... } @Override public void onError(Throwable e) { ... } @Override public void onNext(Throwable o) { ... }});Copy the code

The use of RxJava uses a schedulers.newthread () operator, which means that a newThread is created at a time to execute the task.

Finally, to sum up the cause of the problem, we have a common request parameters if the parameter value is null, then calls a method to get the value of the SDK, due to the back-end upgrade interface returns, the SDK is a problem with the resolution, will go to the observables method above log reporting, this leads to the parameters has been empty, OkHttpClient will be re-created for each new request in the SDK. The scheduler in OkHttpClient uses a thread pool with a core thread, so if a large number of requests will cause the thread pool to be re-created; And Observable uses schedulers.newthread () to create a newThread. These two aspects lead to OOM.

2.5 Unexpected Discovery

RN is connected to our project. In the process of using EPIC to make hook, we found a hidden danger of RN. Here is the explanation: In the ReconnectingWebSocket class, which is mainly used when the user is debugging RN locally, the WebSocket will be set up for communication and will be reconnected if the connection fails

 at com.taobao.android.dexposed.DexposedBridge.handleHookedArtMethod(DexposedBridge.java:273)
        at me.weishu.epic.art.entry.Entry.onHookObject(Entry.java:69)
        at me.weishu.epic.art.entry.Entry.referenceBridge(Entry.java:186)
        at com.squareup.okhttp.internal.UtilThe $1.newThread(Util.java:225)
        at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:631)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1388)
        at com.squareup.okhttp.Dispatcher.enqueue(Dispatcher.java:110)
        at com.squareup.okhttp.Call.enqueue(Call.java:114)
        at com.squareup.okhttp.OkHttpClientThe $1.callEnqueue(OkHttpClient.java:98)
        at com.squareup.okhttp.ws.WebSocketCall.enqueue(WebSocketCall.java:109)
        at com.facebook.react.packagerconnection.ReconnectingWebSocket.connect(ReconnectingWebSocket.java:80)
        at com.facebook.react.packagerconnection.ReconnectingWebSocket.delayedReconnect(ReconnectingWebSocket.java:86)
        at com.facebook.react.packagerconnection.ReconnectingWebSocket.accessThe $000(ReconnectingWebSocket.java:35)
        at com.facebook.react.packagerconnection.ReconnectingWebSocketThe $1.run(ReconnectingWebSocket.java:104)
Copy the code

A thread is created every 2 seconds.

@Override
public synchronized void onFailure(IOException t, Response response) {
    if(mWebSocket ! = null) { abort("Websocket exception", t);
    }
    if(! mClosed) {if(mConnectionCallback ! = null) { mConnectionCallback.onDisconnected(); } // reconnect(); }}Copy the code
private void reconnect() {... mHandler.postDelayed( newRunnable() {
            @Override
            public void run() {// reconnect (); } }, RECONNECT_DELAY_MS); }Copy the code
private synchronized void delayedReconnect() {
    // check that we haven't been closed in the meantime if (! MClosed) {// 3. }}Copy the code
public void connect() {
    if (mClosed) {
      throw new IllegalStateException("Can't connect closed client"); } //4. Create OkHttpClient. httpClient.setConnectTimeout(10, TimeUnit.SECONDS); httpClient.setWriteTimeout(10, TimeUnit.SECONDS); httpClient.setReadTimeout(0, TimeUnit.MINUTES); // Disable timeoutsfor read

    Request request = new Request.Builder().url(mUrl).build();
    WebSocketCall call = WebSocketCall.create(httpClient, request);
    call.enqueue(this);
}
Copy the code

In this case, the implementation recreates the OkHttpClient each time. It reconnects after 2s delay. In a thread pool, non-core threads are destroyed after 60 seconds. If ReconnectingWebSocket fails, it will create the OkHttpClient and the thread pool. This will result in repeated requests for resources, resulting in waste.



These threads are all like thisReconnectingWebSocketTo create, and does not implement reuse.

Iii. Solutions

We have cleared up all the problems, and we can solve them one by one;

  • forSchedulers.newThread()The use of log is removed, because the purpose is for log reporting and no new thread is needed to do it
  • forSDK OkHttpClientAnd the way thread pools abuse notifications outSDKIn the code for the general parameter is empty on the request policy added a new limit, to the number of requests and time interval two dimensions to request, do not frequent requests;

After the problem is solved, let’s use the ps command to perform the previous steps to see the effect, as shown below:

You can see that the number of threads has stabilized, not gone up.

Four,

The above is the process and solution of OOM troubleshooting. There are some matters needing attention in the process of thread troubleshooting:

  • If used in the projectRxJavaYou can use your ownhookMethod,registerSchedulersHookFrom the definitionCPUIntensive andIOIntensive thread creation for easy follow-up and statistics
  • The thread pools used in the project should be uniformly converged and not be wasteful and difficult to troubleshoot by creating thread pools everywhere
  • OkHttpClientThe request should be reusedclientDon’t waste resources by recreating every time
  • SDKThread pool or thread poolOkHttpClientIt is better to expose them and let them be created and managed by the business side, similar toRxJavaThe way of
  • The load on the core thread in the thread pool creation is not high and can be usedallowCoreThreadTimeOutTo make it end, otherwise it will always be there to occupy resources