The battle of the final
I have talked with you twice about pthread OOM. In the second half of last year, I also copied and finished the last part. Today, I will have a brief talk with you.
We looked at the crash of the actual PThread OOM online and found that in some cases, the number of Java threads on the device was in the range of 200-300 and was not on the verge of crashing. However, it still crashed, so we suspect that native thread construction caused the crash.
Since we have an idea, we started to look for relevant information on the Internet and other big companies’ schemes for the thread hook capability of native level.
Infinite gems
This time let’s do what Thanos did and collect the Infinity Stones, maybe just a few, but try to snap our fingers and solve the problem.
In the middle of last year, I found that Matrix had been updated to version 2.0. I found that matrix hooks had been updated to a module called Matrix-hooks. I found the name interesting so I clicked on it and found the pthread hook capability.
xhook
Because before also a simple introduction, but more expansion, author CAI Kelun now went to byte update Bhook, big guy Niubi (broken sound).
This part of the principle can refer to the article bytedance open source AndroidPLThook solution bhook
The little insights
As far as I am concerned, this part is actually ELF (Executable and Linkable Format) file. When virtual machine loads SO, it will locate specific function calls in SO according to ELF. We can use this mechanism to replace the method in so with our intermediate method to hook the code.
pthread hook
In this part, I considered using xhook before I started to write the scheme. Similar to the previous iocanary function, which was monitored by xhook to replace the IO open/write/close method, but this time it is to find the thread creation point, and then get the corresponding stack, after collecting some other related functions we can complete the desired function.
Which is exactly what happened in matrix-hooks this time, which is exactly what happened in Matrix-hooks.
Source code address
void InstallHooks(bool enable_debug) { LOGI(LOG_TAG, "[+] Calling InstallHooks, sThreadTraceEnabled: %d, sThreadStackShinkEnabled: %d", sThreadTraceEnabled, sThreadStackShrinkEnabled); if (! sThreadTraceEnabled && ! sThreadStackShrinkEnabled) { LOGD(LOG_TAG, "[*] InstallHooks was ignored."); return; } FETCH_ORIGIN_FUNC(pthread_create) FETCH_ORIGIN_FUNC(pthread_setname_np) if (sThreadTraceEnabled) { thread_trace::thread_trace_init(); } matrix::PauseLoadSo(); { int ret = xhook_export_symtable_hook("libc.so", "pthread_create", (void *) HANDLER_FUNC_NAME(pthread_create), nullptr); LOGD(LOG_TAG, "export table hook sym: pthread_create, ret: %d", ret); ret = xhook_export_symtable_hook("libc.so", "pthread_setname_np", (void *) HANDLER_FUNC_NAME(pthread_setname_np), nullptr); LOGD(LOG_TAG, "export table hook sym: pthread_setname_np, ret: %d", ret); xhook_register(".*/.*\\.so$", "pthread_create", (void *) HANDLER_FUNC_NAME(pthread_create), nullptr); xhook_register(".*/.*\\.so$", "pthread_setname_np", (void *) HANDLER_FUNC_NAME(pthread_setname_np), nullptr); xhook_enable_debug(enable_debug ? 1:0); xhook_enable_sigsegv_protection(enable_debug ? 0:1); xhook_refresh(0); } matrix::ResumeLoadSo(); }Copy the code
The above code is to replace the function constructed by the native thread. When the pthread_create and pthread_setname_NP methods are triggered, it is replaced by another native code.
Because the code is hooked, you can then convert as many operations as you want and then call the original code.
Wechat-Backtrace
If the current thread construct is Java and we print from the current thread stack, how does native get the stack?
These Wechat bigwigs have opened source a set of high-performance fast unwind stack based on Weike-Backtrace in matrix.
I didn’t actually read this part of the code (mainly I don’t understand this part, I really don’t like this part), but I found trust in this part of the stack data in the process of using it.
Matrix-backtrace warehouse address
struct pthread_meta_t { pid_t tid; char *thread_name; // char *parent_name; wechat_backtrace::BacktraceMode unwind_mode; uint64_t hash; wechat_backtrace::Backtrace native_backtrace; std::atomic<char *> java_stacktrace; pthread_meta_t() : tid(0), thread_name(nullptr), // parent_name(nullptr), unwind_mode(wechat_backtrace::FramePointer), hash(0), native_backtrace(BACKTRACE_INITIALIZER(m_pthread_backtrace_max_frames)), java_stacktrace(nullptr) { } ~pthread_meta_t() = default; pthread_meta_t(const pthread_meta_t &src) { tid = src.tid; thread_name = src.thread_name; // parent_name = src.parent_name; unwind_mode = src.unwind_mode; hash = src.hash; native_backtrace = src.native_backtrace; java_stacktrace.store(src.java_stacktrace.load(std::memory_order_acquire), std::memory_order_release); }};Copy the code
But you can probably tell a little bit about thread creation by looking at the final stack data.
(null) (+0); (null) (+0); _ZN7android6Thread3runEPKcim (+224); _ZN7android12ProcessState17spawnPooledThreadEb (+220); (null) (+0); (null) (+0); (null) (+0);Copy the code
It can be seen that the backtracking ability of native stack is still very good, and the speed is fast enough. At this point we’re probably done with the data collection.
PthreadHook is simple to use
I refer to the official demo for this part, but the difference is that I dump the thread data after the start.
When the number of Java thread stacks exceeds a certain threshold, start analyzing the JSON files of PThreads, then report the data in segments, and observe how threads behave under the threshold.
This part of the code is uploaded to DoKit as part of the Debug component.
try {
val config = ThreadStackShrinkConfig()
.setEnabled(true)
.addIgnoreCreatorSoPatterns(".*/app_tbs/.*")
.addIgnoreCreatorSoPatterns(".*/libany\\.so$")
PthreadHook.INSTANCE
.addHookThread(".*")
.setThreadStackShrinkConfig(config)
.setThreadTraceEnabled(true)
.enableQuicken(true)
PthreadHook.INSTANCE.hook()
} catch (e: HookFailedException) {
e.printStackTrace()
}
Copy the code
This part is the initialization logic of PThread hook. Considering plthook, it is provided to the test package with the debug component capability. Although the number of devices in the test environment is small, a certain amount of problems can be analyzed.
class AutoDumpListener : Application.ActivityLifecycleCallbacks { init { DoKit.APPLICATION.registerActivityLifecycleCallbacks(this) } private fun getThreadCount(): Int { return getAllThreads().size } private var count = 0 /** * ps -p `self` -t * See http://man7.org/linux/man-pages/man1/ps.1.html */ private fun getAllThreads(): MutableList<Thread> { var group = Thread.currentThread().threadGroup var system: ThreadGroup? do { system = group group = group?.parent } while (group ! = null) val count = system? .activeCount() ? : 0 val threads = arrayOfNulls<Thread>(count) system? .enumerate(threads) val list = mutableListOf<Thread>() threads.forEach { it? .let { it1 -> list.add(it1) } } return list } override fun onActivityCreated(activity: Activity, savedInstanceState: Bundle?) { } override fun onActivityStarted(activity: Activity) { } override fun onActivityResumed(activity: Activity) { } override fun onActivityPaused(activity: Activity) { } override fun onActivityStopped(activity: Activity) { } override fun onActivitySaveInstanceState(activity: Activity, outState: Bundle) { } override fun onActivityDestroyed(activity: Activity) {if (count > 3) {return} if (getThreadCount() > Threshold) {dokit.application. } } companion object { const val Threshold = 200 } }Copy the code
Then sign up for a global ActivityLifecycleCallbacks, when the page switching for current Java stack number, when more than threshold, began to dump generated a stack of the current state of the complete, then data reporting.
Pthread hook collects the information of each thread creation, the number of threads used, and the call stack in Native memory. After the dump is triggered, the corresponding JSON file will be generated based on the data in Native memory, and then written into the file. We can do a backtrack by deserializing the JSON file.
Funcontext. dump(invoke: MutableList<PThreadEntity>.() -> Unit = {}) { try { val parent = "$cacheDir/pthread" val output = "$parent/pthread_hook_${System.currentTimeMillis() / 1000L}.log" parent.createDirectory() PthreadHook.INSTANCE.dump(output) fdLimit() val pthreads = getJson(output).parserPThread() pthreads.forEach { Log.i(TAG, "pthread error :$it\n") } invoke.invoke(pthreads) } catch (e: Exception) { } } fun String.createDirectory() { val file = File(this) if (! file.exists()) { file.mkdir() } } fun fdLimit() { Log.i(TAG, "FD limit = " + FDDumpBridge.getFDLimit()) } fun getJson(path: String): String { val stream = File(path).bufferedReader().readText() return stream.apply { Log.i(TAG, "pThread:$this") } } const val TAG = "PThreadDumpHelper"Copy the code
This part is the dump code block. It is relatively simple to pass in the file path, then call jNI method, and basically generate the corresponding JSON file, complete deserialization operation.
Cooperate with Dokit
Because this is a debugging component function, part of the fluctuating data we use data report. In addition, we will also provide an active entrance so that students can check the abnormal data by themselves.
In this part, I think Dokit is very, very useful. After all, you can debug the code freely and expand it by yourself, and it will not be released to the line.
Under UI big wet is a wave name.
Make a ring
In fact, this thing, in recent years have been thinking about how to manage this problem for a long time, after all, when the application once big, what strange phenomenon may happen.
My personal view is that it is not so difficult to solve the problem, the really difficult thing is how to find the problem and locate the problem.
More often than not, when we see a crash on the line, our eyes go black, and our minds are full of who I am, where I am, what I want to do, and what the boundaries of the universe are. If there is an effective tool or other means to quickly locate the problem, this is the key to the problem.
This is my endgame.