Half a year has passed, and now it’s the end of the year, so it’s time to do something. This article is this year’s personal impression of several questions, share.

1. The SIGBUS signal SIGSEGV

First, a description of these two nouns:

  1. SIGBUS(Bus error) means that the address corresponding to the pointer is a valid address, but the Bus cannot use the pointer properly. This is usually the result of unaligned data access.

  2. SIGSEGV(Segment fault) means that the address corresponding to the pointer is invalid and there is no physical memory matching address.

For most developers, this aspect of NDK development is not involved. So what you can think of is the so library that we use.

The sigBUS-related problems I encountered here mainly focus on the integrated aurora push. This post in the Aurora community has the same problems as mine. The information I collected focused on OPPO phones with CPU architecture of ARM64-V8A, Android 5.X OPPO R9M, OPPO R7SM, OPPO A59M, OPPO A59S and so on. The diagram below:

The cause of the problem was that, in order to slim down our APK files, I only added so files related to the Armeabi-V7A architecture. Since most of the current devices are already Armeabi-V7A and ARM64-V8A, I could have used Armeabi, but I ended up sticking with Armeabi-V7A for performance reasons.

Arm64-v8a devices are compatible with ARM64-V8A, ArmeabI-V7A and ArmeABI. But as a result there is no compatibility, or more rigor, on oppo’s phones, resulting in misaligned data access. Why do you say so? It was observed that these problems decreased after the aurora SDK was updated. Of course, if you add arm64-V8A directly, this will not be a problem.

There were a number of factors contributing to this issue, from the tripartite SDK we were using to the phone. However, on the basis of the immutable mobile phone, only we can solve, so try not to slim APK by this method. (In a desperate case, compromise and keep armeabI-V7A and ARM64-V8A).

While SIGSEGV issues rule out architecture compatibility issues, relative to focusing below 5.0 and the machine. This problem is relatively complex, and I came across the following problem:

I searched related problems and found a solution: Webview crash on Samsung Android 4.3 models

If you are interested, you can go and have a look. There are many cases that lead to such problems, so we can only accumulate experience and solve them one by one. Developers who are not involved in the NDK can hardly avoid such problems.

2.TimeoutException

The problem is really unavoidable. According to buyly’s statistics, opPO 5.0~6.0 and some Huawei 5.0 models are mainly used. Ok oppo phone again, Oppo is really strict, I almost turned black powder… (sure, 7,8,9 looks good.)

The feedback is far more than the screenshots, I only took a small portion. The new version has “solved” this problem, so the old version is now mostly reported.

The bugly exception information is as follows:

Error stack message:

FinalizerWatchdogDaemon
java.util.concurrent.TimeoutException
android.os.BinderProxy.finalize() timed out after 120 seconds
android.os.BinderProxy.destroy(Native Method)
android.os.BinderProxy.finalize(Binder.java:547)
java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:214)
java.lang.Daemons$FinalizerDaemon.run(Daemons.java:193)
java.lang.Thread.run(Thread.java:818)
Copy the code

To explain the cause of the problem, four GC related daemon threads are started during GC to reduce application pauses. One of these is FinalizerWatchdogDaemon, which is used to monitor the execution of FinalizerDaemon threads.

FinalizerDaemon: Destructor daemon thread. When objects with finalize member function overridden are collected by GC, they are not collected immediately. Instead, they are put into a queue and wait for FinalizerDaemon thread to call their finalize member function and then be collected.

If it detects that a certain amount of time has elapsed before executing the member function finalize, the VM will be exited. We can understand that GC timed out. The default time is 10s. I found that the time is changed to 120s and 30s in some models by looking through the source code of Oppo and Huawei Framework.

Although the time is longer, the timeout is still the same. We cannot know why oppo is so slow, but we can be sure that the reason is that there are too many Finalizer objects. Now that you know why, it’s easy to model the problem. That is, reference an instance of rewriting Finalize method, and finalize method has time-consuming operation, then we can manually GC. Just a few days ago, in my subscription of Teacher Zhang Shaowen’s “Android Development Master class”, the teacher mentioned this problem, and shared a simulated problem and solve the problem Demo. Anyone interested can try it.

AttachBaseContext = attachBaseContext = attachBaseContext = attachBaseContext = attachBaseContext = attachBaseContext = attachBaseContext

try { final Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon"); final Field field = clazz.getDeclaredField("INSTANCE"); field.setAccessible(true); final Object watchdog = field.get(null); try { final Field thread = clazz.getSuperclass().getDeclaredField("thread"); thread.setAccessible(true); thread.set(watchdog, null); } catch (final Throwable t) { Log.e(TAG, "stopWatchDog, set null occur error:" + t); t.printStackTrace(); Final Method = clazz.getsuperclass ().getDeclaredMethod("stop"); method.setAccessible(true); method.invoke(watchdog); } catch (final Throwable e) { Log.e(TAG, "stopWatchDog, stop occur error:" + t); t.printStackTrace(); } } } catch (final Throwable t) { Log.e(TAG, "stopWatchDog, get object occur error:" + t); t.printStackTrace(); }Copy the code

Instead, I’m using the stackOverflow method from this post:

public static void fix() { try { Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon"); Method method = clazz.getSuperclass().getDeclaredMethod("stop"); method.setAccessible(true); Field field = clazz.getDeclaredField("INSTANCE"); field.setAccessible(true); method.invoke(field.get(null)); } catch (Throwable e) { e.printStackTrace(); }}Copy the code

Both methods use reflection to eventually empty the thread in FinalizerWatchdogDaemon so that the thread does not execute, so no more timeout exceptions occur. Recommend teacher’s method, more comprehensive perfect. Since there were thread safety issues prior to Android 6.0, there is still a chance that this exception will be raised if you call stop directly. 5.0 source code is as follows:

private static abstract class Daemon implements Runnable { private Thread thread; Public synchronized void start() {if (thread! = null) { throw new IllegalStateException("already running"); } thread = new Thread(ThreadGroup.systemThreadGroup, this, getClass().getSimpleName()); thread.setDaemon(true); thread.start(); } public abstract void run(); protected synchronized boolean isRunning() { return thread ! = null; } public synchronized void interrupt() { if (thread == null) { throw new IllegalStateException("not running"); } thread.interrupt(); } public void stop() { Thread threadToStop; synchronized (this) { threadToStop = thread; thread = null; Thread} if (threadToStop == null) {throw new IllegalStateException("not running"); } threadToStop.interrupt(); while (true) { try { threadToStop.join(); return; } catch (InterruptedException ignored) { } } } public synchronized StackTraceElement[] getStackTrace() { return thread ! = null ? thread.getStackTrace() : EmptyArray.STACK_TRACE_ELEMENT; }}Copy the code

This so-called thread-safety problem lies in threadtostop.interrupt () in the stop method. Starting in 6.0, this was changed to Interrupt (threadToStop), and the interrupt method had a synchronization lock.

public synchronized void interrupt(Thread thread) {
     if (thread == null) {
         throw new IllegalStateException("not running");
     }
     thread.interrupt();       
}
Copy the code

The collapse will not happen, but the problem will remain, treating the symptoms rather than the cause. Through this question, we are reminded to avoid rewriting Finalize method as much as possible, and do not have time-consuming operations in finalize method. In fact, all of our Android views have Finalize method, so reducing the creation of View is a solution.

Finalizer object — the hidden killer of memory and performance in large apps

3.SchedulerPoolFactory

When I used Android Studio’s memory analysis tool to check the App recently, I found that more than 20 new instances were allocated every second. After tracing it, I found that it was created by SchedulerPoolFactory in RxJava2.



In general, if a page is created and loaded, there is no new memory allocation, unless the page has animation, rotation, EditText cursor flashing, etc. Of course, we stop these tasks when the application is in the background, or when the page is not visible. Make sure you don’t do these useless operations. While I was in the background, the thread pool was still running, which meant that the CPU was being loaded periodically, which naturally consumed power. Then you have to find a way to optimize.

SchedulerPoolFactory ScheduledExecutorServices create and remove the management.

SchedulerPoolFactory: SchedulerPoolFactory

static void tryStart(boolean purgeEnabled) { if (purgeEnabled) { for (;;) {// ScheduledExecutorService curr = purge_thread.get (); if (curr ! = null) { return; } ScheduledExecutorService next = Executors.newScheduledThreadPool(1, new RxThreadFactory("RxSchedulerPurge")); If (PURGE_THREAD.compareAndSet(curr, next)) {// RxSchedulerPurge, Next.scheduleatfixedrate (new ScheduledTask(), PURGE_PERIOD_SECONDS, PURGE_PERIOD_SECONDS, timeunit.seconds); return; } else { next.shutdownNow(); } } } } static final class ScheduledTask implements Runnable { @Override public void run() { for (ScheduledThreadPoolExecutor e : new ArrayList<ScheduledThreadPoolExecutor>(POOLS.keySet())) { if (e.isShutdown()) { POOLS.remove(e); } else { e.purge(); // In line 154, the Purge method can be used to remove cancelled futures. }}}}Copy the code

I checked the related Issue and found this Issue in StackOverflow. I also raised an Issue for RxJava and got the reply that it can be used:

// Change the cycle time to one hour system. setProperty("rx2.purge-period-seconds", "3600");Copy the code

Of course you can also turn off cycle clearing:

 System.setProperty("rx2.purge-enabled", false);
Copy the code

Scope of action is as follows:

static final class PurgeProperties { boolean purgeEnable; int purgePeriod; void load(Properties properties) { if (properties.containsKey(PURGE_ENABLED_KEY)) { purgeEnable = Boolean.parseBoolean(properties.getProperty(PURGE_ENABLED_KEY)); } else { purgeEnable = true; // The default is true} if (purgeEnable && properties.containsKey(PURGE_PERIOD_SECONDS_KEY)) {try {purgePeriod = Integer.parseInt(properties.getProperty(PURGE_PERIOD_SECONDS_KEY)); } catch (NumberFormatException ex) { purgePeriod = 1; // Default is 1s}} else {purgePeriod = 1; // Default is 1s}}}Copy the code

The 1s clearing period was a little too frequent for me, and I finally decided to change the period to 60s. It is best to modify it before first using RxJava, and preferably in Application.

4. Other

  • Note when adapting 8.0ServiceCreation. Otherwise there will be aIllegalStateExceptionException:
Java. Lang. An IllegalStateException: Not allowed to start the service Intent {XXX. MyService} : app is in background uid nullCopy the code
  • Some phones (known as OpPO) will automatically clear files in the cache when you run out of storage space, so if you have important data stored in the cache, avoid putting it in the cache. Otherwise, when you re-enter the app, you will have a blank pointer for retrieving data again. For example, disk caching is usedDiskLruCacheTo store data.

Finally, a lot of support!!

Reference 5.

  • SIGSEGV and SIGBUS & GDB look at the assembly

  • ART runtime garbage collection (GC) process analysis