This analysis is based on Android S(12)
preface
Memory leaks are one of the most important reasons for the OOM. In the context of a memory leak, it is not the GC algorithm that causes unreferenced objects not to be reclaimed, but objects that are no longer logically needed by the program are still referenced. For Binder communication, Binder objects sent by the Server to the Client process are converted into BinderProxy objects. If the Server sends many different Binder objects to the Client process in a short period of time, Many BinderProxy objects are also created in the Client process. These objects consume the memory of the Client process and can cause “global Reference table overflow” problems in earlier versions.
This BinderProxy overcreation is known as the leak of Binder proxy objects, which affects system_server in particular. Because system_server(Client) receives Binder objects from various APP processes, if each process sends many Binder objects, the system_server memory will run out. As for why APP processes send many Binder objects to system_server, this is either malicious behavior or a programming bug that needs to be banned.
With this in mind, the BpBinder and BinderProxy construction processes incorporate detection to print warnings when too many proxy objects from a UID are created. In the case of the system_server process, it will also kill the bad APP corresponding to the UID.
1. The origin
Prior to Android P, Binder proxy objects were created without detection. Later versions were added because Xiaomi asked Google a question (thank you xiaomi). This issue (link) is currently public, so everyone can understand. Let me analyze this problem briefly.
1.1 Problem Scenario
Insert the following code in your APP and call registerContentObserver in a loop until system_server crashes and the system restarts at about 24,000 times.
public static void triggerWithContent(final Context context) {
final ContentResolver contentResolver = context.getContentResolver();
new Thread(new Runnable() {
Uri uri = Uri.parse("content://sms/sent");
@Override
public void run(a) {
while (true) {
ContentObserver contentObserver = new MyObserver(handler);
contentResolver.registerContentObserver(uri, true, contentObserver);
}
}
}).start();
}
class MyObserver extends ContentObserver {
public MyObserver(Handler handler) {
super(handler);
}
@Override
public void onChange(boolean selfChange) {
this.onChange(selfChange, null);
}
@Override
public void onChange(boolean selfChange, Uri uri) {}}Copy the code
The conditions for this problem to recur are simple, so let’s look at the system_server error message (tombstone file).
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** Build fingerprint: 'xiaomi/mido/mido: 7.0 / NRD90M / 1.1.1: user/test - keys' Revision:' 0 'ABI:' arm64 pid: 1633, dar: 4450, the name: Binder:1633_1B >>> system_server <<< signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- Abort message: 'art/runtime/indirect_reference_table.cc:116] JNI ERROR (app bug): global reference table overflow (max=51200)' x0 0000000000000000 x1 0000000000001162 x2 0000000000000006 x3 0000000000000008 x4 000000000000008c x5 0000007f78f83880 x6 4437f8787f000000 x7 0000007f78f83744 x8 0000000000000083 x9 ffffffffffffffdf x10 0000000000000000 x11 0000000000000001 x12 ffffffffffffffff x13 00000000ffffffff x14 000000000008cc40 x15 0000000000001fe3 x16 0000007f78fc1ed8 x17 0000007f78f6f4d0 x18 0000000000000000 x19 0000007f41b1d4f8 x20 0000000000000006 x21 0000007f41b1d450 x22 000000000000000b x23 000000000000389b x24 ffffffffffffffff x25 0000007f779fc730 x26 0000007f77985ec8 x27 0000007f41b1b341 x28 0000007f7794c09b x29 0000007f41b1b270 x30 0000007f78f6c960 sp 0000007f41b1b250 pc 0000007f78f6f4d8 pstate 0000000000000000Copy the code
The stack is irrelevant to the cause of the problem. Note the above Abort message, which indicates that the current number of global references alive in system_server has exceeded the upper limit (51200) and is the direct cause of the system_server crash.
Surprisingly, the APP loop calls a system method to restart the phone because Binder proxy objects are created too many times.
1.2 Causes
First let’s answer a question: why does Android limit the number of global references? I’m sure many people understand this limitation, but haven’t thought about why. I dare make a guess, as follows.
- Global Reference is used to reference Java objects in the Native layer. Since objects in the Native layer are not scanned by the GC, we must explicitly tell the Collector through the Global reference that the Java object is being referenced and that you should not reclaim it.
- Java objects that are referenced by a Global reference will be scanned as GC Roots during GC and therefore will not be recycled.
- The release of Global references is outside the GC system and can only be specified by the developer in the code. If the developer is careless, these objects will be leaked.
- Increasing the limit on the number of global references lets developers know about these leaks in advance.
Now that we understand the limits of global reference, let’s examine why the above actions trigger the upper limit of global reference.
public final void registerContentObserver(Uri uri, boolean notifyForDescendents,
ContentObserver observer, @UserIdInt int userHandle) {
try {
getContentService().registerContentObserver(uri, notifyForDescendents,
observer.getContentObserver(), userHandle, mTargetSdkVersion);
} catch (RemoteException e) {
}
}
Copy the code
ContentResolver. RegisterContentObserver method there are two important points worthy of attention:
- The getContentService method gets the ContentService proxy object, so the call initiates Binder communication with system_server as the peer.
- Observer. The getContentObserver method will get to a Binder object (as part of the Transport, inheritance in IContentObserver Stub class), Will it as a parameter in the process of system_server create BinderProxy object and IContentObserver Stub. The Proxy objects.
During the creation of BinderProxy, a global reference needs to be added to a field (WeakReference type) of the object, so that BpBinder objects in Native layer can find BinderProxy objects in Java layer.
jobject javaObjectForIBinder(JNIEnv* env, const sp<IBinder>& val)
{...if(object ! =NULL) {...// The native object needs to hold a weak reference back to the
// proxy, so we can retrieve the same proxy if it is still active.
jobject refObject = env->NewGlobalRef(
env->GetObjectField(object, gBinderProxyOffsets.mSelf));
val->attachObject(&gBinderProxyOffsets, refObject,
jnienv_to_javavm(env), proxy_cleanup); . }return object;
}
Copy the code
In addition, in the system_server create IContentObserver. Stub. The Proxy objects will be packaged into ObserverEntry object and add an ArrayList, as shown in the following code.
private void addObserverLocked(Uri uri, int index, IContentObserver observer,
boolean notifyForDescendants, Object observersLock,
int uid, int pid, int userHandle) {
// If this is the leaf node add the observer
if (index == countUriSegments(uri)) {
mObservers.add(new ObserverEntry(observer, notifyForDescendants, observersLock,
uid, pid, userHandle));
return;
}
Copy the code
The linkToDeath operation is performed in the constructor of the ObserverEntry class to register a listener for the death notification of the Transport object in the APP process (system_server usually receives the notification of the death of the object only when the APP process dies).
public ObserverEntry(IContentObserver o, boolean n, Object observersLock,
int _uid, int _pid, int _userHandle) {
this.observersLock = observersLock;
observer = o;
uid = _uid;
pid = _pid;
userHandle = _userHandle;
notifyForDescendants = n;
try {
observer.asBinder().linkToDeath(this.0);
} catch(RemoteException e) { binderDied(); }}Copy the code
LinkToDeath will eventually add a global reference to the ObserverEntry object to ensure that the death notification processing object cannot be collected when BpBinder objects exist.
static void android_os_BinderProxy_linkToDeath(JNIEnv* env, jobject obj, jobject recipient, jint flags) // throws RemoteException
{...if(! target->localBinder()) {
DeathRecipientList* list = (DeathRecipientList*)
env->GetLongField(obj, gBinderProxyOffsets.mOrgue);
sp<JavaDeathRecipient> jdr = new JavaDeathRecipient(env, recipient, list);
status_t err = target->linkToDeath(jdr, NULL, flags); . }}Copy the code
JavaDeathRecipient(JNIEnv* env, jobject object, const sp<DeathRecipientList>& list)
: mVM(jnienv_to_javavm(env)), mObject(env->NewGlobalRef(object)), // Add global reference
mObjectWeak(NULL), mList(list)
{
...
}
Copy the code
As a result, a IContentObserver. Stub. Proxy objects created will be accompanied by two global reference to create. And because these proxy objects end up in an ArrayList, they can’t be recycled.
When the APP cycle calls registerContentObserver about 24,000 times, system_server generates about 48,000 global references, plus the global references it normally consumes. Then it’s easy to hit the 51200 ceiling.
2. Detection scheme
2.1 Detection during the creation of BpBinder
In order to prevent crazy operations of “Bad Behaving Apps”, it is necessary for the system to add agent detection in the core system process. Both Java layer and Native layer proxy objects need to generate BpBinder objects. So we can add detection to its construction process.
sp<BpBinder> BpBinder::create(int32_t handle) {
int32_t trackedUid = - 1;
if (sCountByUidEnabled) {
trackedUid = IPCThreadState::self() - >getCallingUid(a); AutoMutex _l(sTrackingLock);uint32_t trackedValue = sTrackingMap[trackedUid];
if (CC_UNLIKELY(trackedValue & LIMIT_REACHED_MASK)) {
if (sBinderProxyThrottleCreate) {
return nullptr; }}else {
if ((trackedValue & COUNTING_VALUE_MASK) >= sBinderProxyCountHighWatermark) {
ALOGE("Too many binder proxy objects sent to uid %d from uid %d (%d proxies held)".getuid(), trackedUid, trackedValue);
sTrackingMap[trackedUid] |= LIMIT_REACHED_MASK;
if (sLimitCallback) sLimitCallback(trackedUid);
if (sBinderProxyThrottleCreate) {
ALOGI("Throttling binder proxy creates from uid %d in uid %d until binder proxy"
" count drops below %d",
trackedUid, getuid(), sBinderProxyCountLowWatermark);
return nullptr;
}
}
}
sTrackingMap[trackedUid]++;
}
return sp<BpBinder>::make(BinderHandle{handle}, trackedUid);
}
Copy the code
Detection is performed only when sCountByUidEnabled is true. By default, this check is enabled only for system_server and systemUI processes, because these two processes deal with a large number of APP processes. Systemui only enables this detection in debug builds, so it is not enabled on ordinary users’ phones. System_server checks are turned on at any time.
[/frameworks/base/packages/SystemUI/src/com/android/systemui/SystemUIService.java]
if (Build.IS_DEBUGGABLE) {
// b/71353150 - looking for leaked binder proxies
BinderInternal.nSetBinderProxyCountEnabled(true);
BinderInternal.nSetBinderProxyCountWatermarks(1000.900);
BinderInternal.setBinderProxyCountCallback(
new BinderInternal.BinderProxyLimitListener() {
@Override
public void onLimitReached(int uid) {
Slog.w(SystemUIApplication.TAG,
"uid " + uid + " sent too many Binder proxies to uid "
+ Process.myUid());
}
}, mMainHandler);
}
Copy the code
[/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java]
BinderInternal.nSetBinderProxyCountWatermarks(BINDER_PROXY_HIGH_WATERMARK,
BINDER_PROXY_LOW_WATERMARK);
BinderInternal.nSetBinderProxyCountEnabled(true);
BinderInternal.setBinderProxyCountCallback(
(uid) -> {
Slog.wtf(TAG, "Uid " + uid + " sent too many Binders to uid "
+ Process.myUid());
BinderProxy.dumpProxyDebugInfo();
if (uid == Process.SYSTEM_UID) {
Slog.i(TAG, "Skipping kill (uid is SYSTEM)");
} else {
killUid(UserHandle.getAppId(uid), UserHandle.getUserId(uid),
"Too many Binders sent to SYSTEM");
}
}, mHandler);
Copy the code
/** * The number of binder proxies we need to have before we start warning and * dumping debug info. */
private static final int BINDER_PROXY_HIGH_WATERMARK = 6000;
/** * Low watermark that needs to be met before we consider dumping info again, * after already hitting the high watermark. */
private static final int BINDER_PROXY_LOW_WATERMARK = 5500;
Copy the code
The following uses system_server as an example to describe the running rules of the detection algorithm in pictures. Testing occurs at BpBinder::Create, but whether or not testing is enabled is also related to the BpBinder destruction process. In addition, the purpose of setting the lower water level value is to prevent onLimitReached from being called back too many times and thus affecting program performance.
The system_server onLimitReached callback method does two things:
-
Call BinderProxy. DumpProxyDebugInfo method to output system_server BinderProxy information created in.
public static void dumpProxyDebugInfo() { if (Build.IS_DEBUGGABLE) { sProxyMap.dumpProxyInterfaceCounts(); sProxyMap.dumpPerUidProxyCounts(); }}Copy the code
-
If the UID is not a system UID, kill the APP to which the UID belongs.
2.2 Detection during BinderProxy creation
In addition to the per-UID creation caps in BpBinder, BinderProxy creation limits the total number of proxy objects for the entire process (regardless of which UID they come from).
BinderProxy is managed by ProxyMap, using the address of BinderProxyNativeData on the Native layer as the key, and WeakReference<BinderProxy> object as the value. In this way, a one-to-one mapping between BinderProxyNativeData object and BinderProxy object can be established. When the BinderProxy object is created, you need to add an element to the ProxyMap. The number of all values in the ProxyMap will be detected during element increment, and an exception will be thrown if the value exceeds a certain value.
if (size >= mWarnBucketSize) {
final int totalSize = size();
Log.v(Binder.TAG, "BinderProxy map growth! bucket size = " + size
+ " total = " + totalSize);
mWarnBucketSize += WARN_INCREMENT;
if (Build.IS_DEBUGGABLE && totalSize >= CRASH_AT_SIZE) {
// Use the number of uncleared entries to determine whether we should
// really report a histogram and crash. We don't want to fundamentally
// change behavior for a debuggable process, so we GC only if we are
// about to crash.
final int totalUnclearedSize = unclearedSize();
if (totalUnclearedSize >= CRASH_AT_SIZE) {
dumpProxyInterfaceCounts();
dumpPerUidProxyCounts();
Runtime.getRuntime().gc();
throw new AssertionError("Binder ProxyMap has too many entries: "
+ totalSize + " (total), " + totalUnclearedSize + " (uncleared), "
+ unclearedSize() + " (uncleared after GC). BinderProxy leak?");
} else if (totalSize > 3 * totalUnclearedSize / 2) {
Log.v(Binder.TAG, "BinderProxy map has many cleared entries: "
+ (totalSize - totalUnclearedSize) + " of " + totalSize
+ " are cleared"); }}}Copy the code
Because the value in ProxyMap records the WeakReference object, its corresponding BinderProxy object may not exist, which can be achieved through weakReference.get ()! = null to determine. Then the specific meanings of the above variables are as follows:
- TotalSize: Records the number of WeakReference objects, therefore ≥BinderProxy objects.
- TotalUnclearedSize: indicates the number of BinderProxy objects.
- UnclearedSize () after GC: Some Binderproxies may be reclaimed during this GC due to lack of strong references, so this value will be ≤totalUnclearedSize.
If the value of CRASH_AT_SIZE is 20000, an exception will be thrown when the number of Binderproxies in a process exceeds 20000. However, this detection is only enabled in the build. IS_DEBUGGABLE version and is not perceived by ordinary users.
2.3 Debugging Information
In debug releases, both BpBinder and BinderProxy tests will try to output debugging information when the water mark is exceeded to help developers understand the direction of improvement. The output debugging information is classified into two types: interface Counts, which is categorized by interface name, and UID Counts, which is categorized by the UID.
2.3.1 dumpProxyInterfaceCounts
The dumped information is as follows, which records the names of the top 10 interfaces. The x symbol is followed by the number of BinderProxy objects corresponding to this interface.
Binder : BinderProxy descriptor histogram (top 10):
Binder : #1: <proxy to dead node> x13298
Binder : #2: android.view.IWindow x3732
Binder : #3: android.view.accessibility.IAccessibilityManagerClient x1109
Binder : #4: android.content.IIntentReceiver x707
Binder : #5: android.database.IContentObserver x676
Binder : #6: x512
Binder : #7: <cleared weak-ref> x222
Binder : #8: com.android.internal.textservice.ISpellCheckerService x127
Binder : #9: android.view.accessibility.IAccessibilityInteractionConnection x112
Binder : #10: android.content.IContentProvider x69
Copy the code
The first is the method to get the interface name, and the bottom line eventually calls BpBinder’s getInterfaceDescriptor method. If this is the first attempt, a Binder communication is initiated to retrieve this information, since the interface name is usually stored in a Stub object.
const String16& BpBinder::getInterfaceDescriptor(a) const
{
if (isDescriptorCached() = =false) {
sp<BpBinder> thiz = sp<BpBinder>::fromExisting(const_cast<BpBinder*>(this));
Parcel data;
data.markForBinder(thiz);
Parcel reply;
// do the IPC without a lock held.
status_t err = thiz->transact(INTERFACE_TRANSACTION, data, &reply);
if (err == NO_ERROR) {
String16 res(reply.readString16());
Mutex::Autolock _l(mLock);
// mDescriptorCache could have been assigned while the lock was
// released.
if (mDescriptorCache.size() = =0) mDescriptorCache = res; }}// we're returning a reference to a non-static object here. Usually this
// is not something smart to do, however, with binder objects it is
// (usually) safe because they are reference-counted.
return mDescriptorCache;
}
Copy the code
Since the process of getting the interface name initiates Binder communication, this means that it is a process that can be blocked by the other process. So dumpProxyInterfaceCounts creates a separate thread to do this and sets a 20-second timeout.
Next comes the meaning of the interface name. Things like Android.view.iwindow are relatively easy to read, but there are a few odd names in the list above.
#1: <proxy to dead node> x13298
The BinderProxy for all Binder objects in the dead process is recorded in this entry, regardless of the actual interface name (in fact, we can’t know their real name, because we can’t get this information through Binder communication).#6: x512
: The name is empty, usually because the corresponding Binder object is passed directlynew Binder()
As opposed to subclasses of Binder.#7: <cleared weak-ref> x222
The: WeakReference object exists, and the corresponding BinderProxy object has been reclaimed.
To follow up on a problem like this, check logcat to see which process was just killed and why, as it is not the focus of this article.
2.3.2 dumpPerUidProxyCounts
The BpBinder quantity information for each UID is stored in the BpBinder::sTrackingMap, so dumpPerUidProxyCounts essentially reads this information back from the native layer. The read back information is sorted in order of UUIds, and all UUIds that interact with System_Server (sending Binder objects to System_Server) are recorded in the column.
Binder : Per Uid Binder Proxy Counts:
Binder : UID : 0 count = 40
Binder : UID : 1000 count = 576
Binder : UID : 1001 count = 155
...
Binder : UID : 10128 count = 4
Binder : UID : 10131 count = 212
Binder : UID : 10138 count = 566
Binder : UID : 10144 count = 4
Binder : UID : 10148 count = 5194
Binder : UID : 10154 count = 164
Binder : UID : 10161 count = 5
Binder : UID : 10164 count = 13
Binder : UID : 10166 count = 893
Binder : UID : 10174 count = 9
Binder : UID : 10175 count = 636
Binder : UID : 10178 count = 3
Binder : UID : 10180 count = 4
Binder : UID : 10183 count = 5
...
Copy the code
conclusion
This article is still a niche topic, but it’s easy enough to read over dinner. In addition, there is some room for optimization of debugging information for this type of problem, but the improvement is not expected to be significant. As an aside, I want to write a slightly “hot” article in the future, which is the reason why “Art ::GoToRunnable” is often seen in the call stack, which relates to the working principle of Signal_catcher.