use
The module used for I/O monitoring in Matrix is IOCanary, which is a tool to help find I/O problems in the development, test or gray stage. Currently, it mainly includes file I/O monitoring and Closeable Leak monitoring.
There are four types of specific questions:
- An I/O operation was performed on the main thread. Procedure
- Buffer is too small
- Read the same file repeatedly
- Resource leaks
IOCanary uses ELF Hooks (ELF Hooks) to collect I/O information without intrusive code, allowing developers to access without feeling. Configure and start the IOCanaryPlugin:
IOCanaryPlugin ioCanaryPlugin = new IOCanaryPlugin(new IOConfig.Builder()
.dynamicConfig(dynamicConfig)
.build());
builder.plugin(ioCanaryPlugin);
Copy the code
IO configuration options are as follows:
Enum ExptEnum {clicfg_matrix_io_file_io_main_thread_enable, Clicfg_matrix_io_main_thread_enable_threshold, // Read/write time // Monitor buffer size problem clicfg_matrix_IO_SMALL_buffer_enable, Clicfg_matrix_io_small_buffer_threshold, // Minimum buffer size Clicfg_matrix_io_repeated_read_enable, clicfg_Matrix_IO_REPEATed_read_threshold, Clicfg_matrix_io_closeable_leak_enable,}Copy the code
The following is an example of the reported information when a resource leak occurs (for example, the read/write stream is not closed) :
{ "tag": "io", "type": 4, "process": "sample.tencent.matrix", "time": 1590410170122, "stack": "sample.tencent.matrix.io.TestIOActivity.leakSth(TestIOActivity.java:190)\nsample.tencent.matrix.io.TestIOActivity.onCli ck(TestIOActivity.java:103)\njava.lang.reflect.Method.invoke(Native Method)\nandroid.view.View$DeclaredOnClickListener.onClick(View.java:4461)\nandroid.view.View.performClick(View.java:521 2)\nandroid.view.View$PerformClick.run(View.java:21214)\nandroid.app.ActivityThread.main(ActivityThread.java:5619)\njava .lang.reflect.Method.invoke(Native Method)\ncom.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:853)\ncom.android.internal.os.Zygote Init.main(ZygoteInit.java:737)\n", }Copy the code
The following is an example of a report with too many writes and too small buffers:
{"tag": "IO ", "type": 2, // Problem type" process": "sample.tencent. Matrix ", "time": 1590409786187, "path": "/sdcard/a_long. TXT ", // File path "size": 40960000, // File size" op": 80000, // read and write times "buffer": 512, // Buffer size" cost": 1453, // Time "opType": 2, // 1 read 2 write "opSize": 40960000, // Total memory "thread": "main", "stack": "sample.tencent.matrix.io.TestIOActivity.writeLongSth(TestIOActivity.java:129)\nsample.tencent.matrix.io.TestIOActivity. onClick(TestIOActivity.java:99)\njava.lang.reflect.Method.invoke(Native Method)\nandroid.view.View$DeclaredOnClickListener.onClick(View.java:4461)\nandroid.view.View.performClick(View.java:521 2)\nandroid.view.View$PerformClick.run(View.java:21214)\nandroid.app.ActivityThread.main(ActivityThread.java:5619)\njava .lang.reflect.Method.invoke(Native Method)\ncom.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:853)\ncom.android.internal.os.Zygote Zygoteinit.java :737)\n", "repeat": 0}Copy the code
Note that the field repeat has a different meaning in the main thread IO event: “1” means that a single read/write takes too long; 2″ : indicates that the continuous read/write time is too long (greater than the value specified in the configuration). 3″ indicates that the first two problems exist.
The principle is introduced
IOCanary will collect all file I/O information of the application and make relevant statistics. Then, it will detect the application according to certain algorithm rules and report the problem to Matrix background for analysis and display. The flow chart is as follows:
IOCanary collects IO information based on xHook, which hooks four key file operation interfaces of OS POSIX:
int open(const char *pathname, int flags, mode_t mode); Ssize_t read(int fd, void *buf, size_t size); ssize_t write(int fd, const void *buf, size_t size); int close(int fd);Copy the code
Libjavore. So, hook libjavore. So, hook libjavore. Different Versions of Android may be somewhat different, currently compatible with Android P.
In addition, unlike other IO events, resource leak monitoring is supported by Android itself, which is based on the tool class Dalvik.System.CloseGuard. Therefore, resource leak monitoring can be implemented in the Java layer through the reflection hook related API.
Hook to introduce
Want to understand hook technology, first need to understand dynamic links, before understanding dynamic links, and need to start from static links.
Static linking allows developers to develop their own program modules independently and then link them together, but static linking also has problems such as wasted memory and disk updates, and update difficulties. For example, program1 and program2 both rely on the lib. o module, and there will eventually be two lib. o modules linked to the executable, wasting a lot of memory. At the same time, as soon as there is any module update in the program, the entire program must be re-linked and published to the user.
Therefore, the easiest way to solve the problem of wasted space and update difficulties is to separate the modules of a program from each other into separate files rather than statically linking them together. In other words, link while the program is running, which is the basic idea of dynamic linking.
While dynamic linking brings many optimizations, it also brings up a new problem: how does a shared object determine its location in the process’s virtual address space when it is loaded?
The idea is to separate out the parts of the instructions that need to be modified and put them together with the data parts.
For data access and function calls within a module, these instructions do not need to be relocated because their relative positions are fixed.
For data access and function calls outside the module, the basic idea is to put the address-related parts into the data segment and create an array of Pointers to these variables. This data is also known as the Global Offset Table (GOT). When loading modules, the linker looks up the address of each variable and populates the entries in the GLOBAL offset system to make sure each pointer points to the correct address.
Dynamic links are slower than static links because dynamic links require complex GLOBAL and static data access and then address indirectly.
For this problem, there may be many functions in a program that are not used until the program is finished, such as error handlers, etc. It would be a waste to chain all functions in the beginning, so ELF uses the method of delayed binding. The basic idea is that the dynamic linker does the binding (symbol lookup, relocation, etc.) when the function is first used. The delay binding corresponds to the PLT (Procedure Linkage Table) segment. In other words, ELF adds another layer of indirect jump on top of GOT.
Therefore, the so-called hook technology is actually to modify the CONTENT of the PLT/GOT table.
The source code parsing
IOCanary’s source code structure is very clear, and the flow is as follows:
- Hook target so file open, read, write, close functions
- When I/O is being executed, information such as THE I/O time, operation count, and buffer size is recorded and stored in IOInfo
- I/O info is inserted into a queue when the close method is called after I/O execution
- The background thread loops the IOInfo from the queue and passes it to the Detector for inspection
- If Detector thinks there is a problem, it reports it
hook
IOCanary’s hook target so file includes libopenjdkjvm.so, libjavore. So, libopenjdk.so. If libjavore. So is used, read and write functions are also hooked. The source code is shown below,
const static char* TARGET_MODULES[] = { "libopenjdkjvm.so", "libjavacore.so", "libopenjdk.so" }; const static size_t TARGET_MODULE_COUNT = sizeof(TARGET_MODULES) / sizeof(char*); JNIEXPORT jboolean JNICALL Java_com_tencent_matrix_iocanary_core_IOCanaryJniBridge_doHook(JNIEnv *env, jclass type) { for (int i = 0; i < TARGET_MODULE_COUNT; ++i) { const char* so_name = TARGET_MODULES[i]; void* soinfo = xhook_elf_open(so_name); // replace the target function with its own implementation xhook_hook_symbol(soinfo, "open", (void*)ProxyOpen, (void**)&original_open); xhook_hook_symbol(soinfo, "open64", (void*)ProxyOpen64, (void**)&original_open64); bool is_libjavacore = (strstr(so_name, "libjavacore.so") ! = nullptr); if (is_libjavacore) { xhook_hook_symbol(soinfo, "read", (void*)ProxyRead, (void**)&original_read); xhook_hook_symbol(soinfo, "__read_chk", (void*)ProxyReadChk, (void**)&original_read_chk); xhook_hook_symbol(soinfo, "write", (void*)ProxyWrite, (void**)&original_write); xhook_hook_symbol(soinfo, "__write_chk", (void*)ProxyWriteChk, (void**)&original_write_chk); } xhook_hook_symbol(soinfo, "close", (void*)ProxyClose, (void**)&original_close); xhook_elf_close(soinfo); }}Copy the code
Statistic I/O operations
To analyze whether the main thread I/O, buffer size is too small, or the same file is read repeatedly, you need to collect statistics on each I/O operation, including THE I/O time, operation count, and buffer size.
This information will eventually be saved by the Collector. To do this, when performing the open operation, you need to create an IOInfo file and store it in the map with the key as the file handle:
int ProxyOpen(const char *pathname, int flags, mode_t mode) { int ret = original_open(pathname, flags, mode); if (ret ! = -1) { DoProxyOpenLogic(pathname, flags, mode, ret); } return ret; } static void DoProxyOpenLogic(const char *pathname, int flags, mode_t mode, int ret) { ... // Get the JavaContext iocanary:: iocanary:: Get().OnOpen(pathname, flags, mode, ret). java_context); } void IOCanary::OnOpen(...) { collector_.OnOpen(pathname, flags, mode, open_ret, java_context); } void IOInfoCollector::OnOpen(...) { std::shared_ptr<IOInfo> info = std::make_shared<IOInfo>(pathname, java_context); info_map_.insert(std::make_pair(open_ret, info)); }Copy the code
Then, when a read/write operation is performed, the IOInfo information is updated:
void IOInfoCollector::OnWrite(...) { CountRWInfo(fd, FileOpType::kWrite, size, write_cost); } void IOInfoCollector::CountRWInfo(int fd, const FileOpType &fileOpType, long op_size, long rw_cost) { info_map_[fd]->op_cnt_ ++; info_map_[fd]->op_size_ += op_size; info_map_[fd]->rw_cost_us_ += rw_cost; . }Copy the code
Finally, when close is executed, IOInfo is inserted into the queue:
void IOCanary::OnClose(int fd, int close_ret) { std::shared_ptr<IOInfo> info = collector_.OnClose(fd, close_ret); OfferFileIOInfo(info); } void IOCanary::OfferFileIOInfo(std::shared_ptr<IOInfo> file_io_info) { std::unique_lock<std::mutex> lock(queue_mutex_); queue_.push_back(file_io_info); Queue_cv_.notify_one (); queue_cv_.notify_one(); // Wake up the background thread, the queue has new data lock.unlock(); }Copy the code
Detecting I/O Events
When the background thread is woken up, it first gets an IOInfo from the queue:
int IOCanary::TakeFileIOInfo(std::shared_ptr<IOInfo> &file_io_info) {
std::unique_lock<std::mutex> lock(queue_mutex_);
while (queue_.empty()) {
queue_cv_.wait(lock);
}
file_io_info = queue_.front();
queue_.pop_front();
return 0;
}
Copy the code
Then, IOInfo is passed to all registered detectors. After the Detector returns the Issue, it calls back to the upper Java interface to report the problem:
void IOCanary::Detect() { std::vector<Issue> published_issues; std::shared_ptr<IOInfo> file_io_info; while (true) { published_issues.clear(); int ret = TakeFileIOInfo(file_io_info); for (auto detector : detectors_) { detector->Detect(env_, *file_io_info, published_issues); } if (issued_callback_ &&! Published_issues.empty ()) {issued_callback_(published_issues); }}}Copy the code
Small_buffer_detector, for example, reports a problem if the BUFFer_size_ field of IOInfo is larger than the value given by the option:
void FileIOSmallBufferDetector::Detect(...) {if (file_io_info op_cnt_ > env. KSmallBufferOpTimesThreshold / / read/write times continuously && (file_io_info op_size_ / File_io_info. Op_cnt_) < env. GetSmallBufferThreshold () / / buffer size && file_io_info. Max_continual_rw_cost_time_ mu s_ > = Env. KPossibleNegativeThreshold) take * / {/ * continuous, speaking, reading and writing PublishIssue (Issue (kType, file_io_info), issues). }}Copy the code
Resource leak monitoring
The Android Framework has implemented the function of resource leak monitoring based on the utility class Dalvik.system.CloseGuard. In the case of FileInputStream, when the GC is about to reclaim the FileInputStream, it calls Guard.warnifOpen to check if the IO stream is closed:
public class FileInputStream extends InputStream { private final CloseGuard guard = CloseGuard.get(); public FileInputStream(File file) { ... guard.open("close"); } public void close() { guard.close(); } protected void finalize() throws IOException { if (guard ! = null) { guard.warnIfOpen(); }}}Copy the code
The source code for CloseGuard is as follows:
final class CloseGuard { public void warnIfOpen() { REPORTER.report(message, allocationSite); }}Copy the code
As you can see, REPORTER’s report method is called if the IO stream is not closed when warnIfOpen is executed.
Therefore, use reflection to replace REPORTER with your own:
public final class CloseGuardHooker { private boolean tryHook() { Class<? > closeGuardCls = Class.forName("dalvik.system.CloseGuard"); Class<? > closeGuardReporterCls = Class.forName("dalvik.system.CloseGuard$Reporter"); Method methodGetReporter = closeGuardCls.getDeclaredMethod("getReporter"); Method methodSetReporter = closeGuardCls.getDeclaredMethod("setReporter", closeGuardReporterCls); Method methodSetEnabled = closeGuardCls.getDeclaredMethod("setEnabled", boolean.class); sOriginalReporter = methodGetReporter.invoke(null); methodSetEnabled.invoke(null, true); ClassLoader classLoader = closeGuardReporterCls.getClassLoader(); methodSetReporter.invoke(null, Proxy.newProxyInstance(classLoader, new Class<? >[]{closeGuardReporterCls}, new IOCloseLeakDetector(issueListener, sOriginalReporter))); }}Copy the code
CloseGuard is used for much of the framework code, so problems such as file resources that are not close or Cursor that are not close can be detected using it.
conclusion
IOCanary is a tool to help find I/O problems in the development, test, or grayscale phase. Currently, IOCanary mainly includes file I/O monitoring and Closeable Leak monitoring. There are four types of specific questions:
- An I/O operation was performed on the main thread. Procedure
- Buffer is too small
- Read the same file repeatedly
- Resource leaks
Based on xHook, IOCanary will collect all file I/O information of the application and make relevant statistics, and then detect it according to certain algorithm rules, and report the problem to Matrix background for analysis and display.
The process is as follows:
- Hook target so file open, read, write, close functions
- When I/O is being executed, information such as THE I/O time, operation count, and buffer size is recorded and stored in IOInfo
- I/O info is inserted into a queue when the close method is called after I/O execution
- The background thread loops the IOInfo from the queue and passes it to the Detector for inspection
- If Detector thinks there is a problem, it reports it
Different from other IO events, for resource leak monitoring, Android itself supports this function, which is based on the tool class Dalvik.System.CloseGuard to implement, so in the Java layer through the reflection hook CloseGuard can achieve resource leak monitoring. As CloseGuard is used for a lot of code in the Android framework layer, problems such as file resources that are not close or Cursor that are not close can be detected using it.