1, the introduction

As the Android system continues to upgrade, IM and push developers in the IM technology group and community are becoming more and more pessimistic about the process of survival, and the system is becoming more and more limited to various types of survival black technology, and it is becoming more and more difficult to go beyond the system’s limitations.

But the survival of this matter is like the aftertaste of “passion”, always make people unable to stop, want to give up but unwilling. So, in addition to like the previous article, 2020, Android backend survival is still a play? See how I gracefully implement! Such a serious whitelist way, not serious “black technology” still have room to play?

The answer is yes, there is still room for hacking. It is not that “black technology” is not good, but that the technology is not in place.

It was a chance to study the survival of TIM, and I found that under the condition that the security center had turned off its self-start function, all kinds of measures such as one-click cleaning and strong cleaning could not completely kill TIM, and the system’s self-start interception could not prevent TIM’s immortality, which aroused my strong interest, and thus this paper came into being.

This article will be from the Andriod system level for you in-depth analysis of Tencent TIM this IM application super live ability, I hope to bring you more Android inspiration.

* Special disclaimer: the technical research and analysis process of this article is only for the use of technology enthusiasts, do not use it for illegal purposes.

Expanding knowledge: What is Tencent TIM? (The following text is from Baidu Baike)

TIM is a multi-platform IM client application released by Tencent in November 2016. TIM added the support of collaborative office service on the basis of QQ chat version, which can be logged in by QQ number, as well as friends and message synchronization, which is suitable for office use.

(This article is simultaneously published at: www.52im.net/thread-2893…)

2. The author of this article

Yuan Huihui: Joined Bytedance’s mobile platform division in May 2019. Graduated from Xi ‘an Electronic Science and Technology University, worked for Xiaomi, Lenovo, IBM.

I was mainly engaged in the research and development of Android mobile phone system. During my previous work in MIUI system group of Xiaomi, I was mainly responsible for the Framework optimization, system stability, technical pre-research and platform construction of Xiaomi Android mobile phone. Keen on the research of Android system kernel technology, has a deep understanding of Android system framework and rich practical experience, wrote nearly 200 high-quality articles, many times invited to attend the industry Android technology conference speech.

3. Review of survival techniques

The evolution of Android survivability can be divided into several phases.

The first stage: this is the era of all kinds of “black technology”, such as Q’s 1-pixel, silent music in the background (a pedometer APP did this) and so on.

Some of the main techniques typical of this stage can be seen in the following articles:

  1. “Application survival (1) : dual-process daemon survival practices under Android6.0”
  2. How to Keep the Android Process Alive: One Article to Solve All Your Questions
  3. “Wechat Team original Sharing: Android version of wechat background Survival combat Sharing (Process survival)”

The second stage: After The Android 6.0 era, Android survival began to be a bit of technical difficulty, the previous various brainless survival methods began to slowly fail.

For some typical techniques at this stage, read the following articles:

  1. Android6.0 and above (process prevention)
  2. Android6.0 and above survival practices (killed and resurrected)

The third stage: In the Android 8.0 era, Android has implemented more and more stringent controls directly at the system level, and there are fewer and fewer ways to keep alive. Conservation technologies have diverged in two directions — either whitelisting the serious conservation path, or getting darker and darker to the end (as TIM’s conservation approach is described in this article).

At this stage, there are not many ways to keep alive. The following articles are a review of the current technical feasibility and so on:

  1. Android P is coming soon: The Real Nightmare of Backend Apps and Notifications
  2. A Comprehensive Review of the real Operation Effect of the Current Android Background Preservation Scheme (by 2019)
  3. In 2020, Android backstage survival is still a play? See how I elegantly implement!

4. What is survival?

Save is a process that tries to save itself from being killed or reborn immediately after it is killed when the user kills it or when the system triggers a clean process based on its current memory shortage.

Keeping alive is “the honeypot of application and the tumor of the system”. The application of high survival rate can win online time for itself, and even do various behaviors that applications want to do but users do not expect, which bring unnecessary power consumption to the system and additional performance burden to the system.

There are so many ways to keep apps alive that APP developers are constantly trying to figure out how to make their apps last longer. There are two main ideas.

Increased progression priority and reduced chance of being killed:

  • 1) For example, listen to SCREEN_ON/OFF broadcasts and start a transparent Activity with one pixel;
  • 2) Start null notification and promote FG-service;
  • . .

When a process is killed, restart the process:

  • 1) The monitoring system or the third party broadcasts the pull-up process. But now the security center /Whetstone has intercepted;
  • 2) The Native fork processes listen on each other. When the parent process is killed, the am command is used to start the process. Force-stop kills the entire process group, so this method is almost impossible to use.

5. Preliminary analysis

5.1 I met TIM

Execute commands the adb shell ps | grep tencent. Tim, Tim a total of four processes, its parent is Zygote:

root@gityuan:/ # ps | grep tencent.tim

u0_a146 27965 551 1230992 43964 SyS_epoll_ 00f6df4bf0 S com.tencent.tim:Daemon

u0_a146 27996 551 1252492 54032 SyS_epoll_ 00f6df4bf0 S com.tencent.tim:MSF

u0_a146 28364 551 1348616 89204 SyS_epoll_ 00f6df4bf0 S com.tencent.tim:mail

u0_a146 31587 551 1406128 147976 SyS_epoll_ 00f6df4bf0 S com.tencent.tim

5.2 One-click cleaning to see the phenomenon and check initial suspicion

Here is the log after a one-click cleanup of TIM:

12-21 21:12:20. 265 1053 1075 I am_kill: [2, 0489 com. Tencent. Tim: Daemon, 5, stop com. Tencent. Tim: from pid 4617]

12-21 21:12:20. 272 1053 1075 I am_kill: [6, 0527 com. Tencent. Tim: mail, 2, stop com. Tencent. Tim: from pid 4617]

12-21 21:12:20. 305 1053 1075 I am_kill: [8, 0492 com. Tencent. Tim, 2, stop com. Tencent. Tim: from pid 4617]

12-21 21:12:20. 330 1053 1075 I am_kill: [0, 0491 com. Tencent. Tim: MSF, 0, stop com. Tencent. Tim: from pid 4617]

12-21 21:13:59.920 1053 1466 I am_proc_start: [0548, 7101, 46, com. Tencent. Tim: MSF, service, com. Tencent. Tim/com. Tencent. Mobileqq. App. DaemonMsfService]

12-21 21:13:59.984 1053 1604 I am_proc_start: [0551, 6101, 46, com. Tencent. Tim, content provider, com. Tencent. Tim/com. Tencent. MQQ. Shared_file_accessor. ContentProviderImpl]

Force-stop is the most thorough way to kill processes provided by the system, as described in the article forceStop for Android Processes. The log shows that all four TIM processes are force-stopped after one click of clearing. Tim :MSF was immediately pulled up by the DaemonMsfService startup process.

Question 1: The security center has been configured to prohibit TIM self-start, and both the security center and the system have strict restrictions on self-start and cascading processes. Why could the system fail to catch TIM?

Doubt 1: since the launch of the security center can effectively not limited, and cascade WeChat/QQ with TIM, such as com. Tencent. Mobileqq. App. DaemonMsfService service name in com. Tencent. Mobileqq package name (QQ) at the beginning.

After Dumpsys and repeated verification, this possibility is eliminated as follows:

12-21 21:12:20.266 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

12-21 21:12:20.291 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

12-21 21:12:20.323 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

12-21 21:12:20.323 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

12-21 21:12:20.331 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

12-21 21:12:20.332 1053 1075 I AutoStartManagerService: MIUILOG- Reject RestartService packageName :com.tencent.tim uid : 10146

Doubt 2: Whether the Service should be pulled up again after a TIM process is killed during a death callback after a BinderDied was quickly ruled out because force-stop is a cold-blooded killer that does not wait for a death callback to clean up the process, but simply uproot it. Does not go to the death of AMS callback.

Doubt 3: TIM has set the alarm mechanism, and it is empty in callApp, which is consistent with the characteristics, but after analysis, it is common startService, not startServiceInPackage(), and this possibility is also excluded:

// When DaemonAssistService is started, callApp is empty, which can only happen with PendingIntent

12-21 21:56:54.653 3181 3195 I am_start_service: [-1,NULL,10146,com.tencent.tim:Daemon,com.tencent.tim/com.tencent.mobileqq.app.DaemonAssistService,{cmp=com.tencent.tim/ com.tencent.mobileqq.app.DaemonAssistService}]

12-21 21:56:56.666 3181 3827 I am_start_service: [-1,NULL,10146,com.tencent.tim:MSF,com.tencent.tim/com.tencent.mobileqq.app.DaemonMsfService,{cmp=com.tencent.tim/com.te ncent.mobileqq.app.DaemonMsfService}]

Now that the above three possibilities are ruled out, let’s go straight to the break point.

5.3 Breakpoint analysis for Android Studio

As soon as I broke, I found an unexpected scene:

Question 2: How can the startService() callingPid be equal to 0?

5.3.1) Analysis callingPid=0:

Why is it an accident? This requires a deep understanding of binder’s underlying principles to see some clues, that is, callingPid=0 here is not reasonable logic. Binder call (startService()) is a synchronous Binder call. Binder call () is a synchronous Binder call. CallingPid =0 is null only if called by asynchronous Binder because there is no need for reply reply data to be sent to the end of the Binder request. But if it is synchronous, a callingPid must be given, otherwise the reply data cannot be sent back to the sender. This is determined by the Binder Driver as shown in the Binder Driver core code below.

(1) Binder initiator: use ONE_WAY to determine whether to set from thread

binder_transaction(…) {

.

if(! reply && ! (tr->flags & TF_ONE_WAY))

t->from = thread;

else

t->from = NULL;

}

.

}

(2) Binder receiver: sender_PID is 0 according to whether from thread is null. This is what the Java layer calls a callingPid

binder_thread_read(…) {

.

t_from = binder_get_txn_from(t);

if(t_from) {

structtask_struct *sender = t_from->proc->tsk;

tr.sender_pid = task_tgid_nr_ns(sender,

task_active_pid_ns(current));

} else{

tr.sender_pid = 0;

}

.

}

The code above shows that the callingPid must not equal 0 if the Binder calls synchronously.

The final parameter is FLAG_ONEWAY, 0 for synchronous and 1 for asynchronous.

The above code is the framework code of the framework, startService will eventually be called here, so the callingPid must not be 0, let us not know exactly which process com.tencent. Tim: Daemon pulled up.

5.3.2) Disclosure:

According to the previous analysis, callingPid cannot be 0, but the result shows that it is indeed 0. If there is a contradiction, there must be an anomaly. Is there a synchronous Binder call or a case where callingPid=0? The answer is No.

The binder call of startService is ONE_WAY (ONE_WAY), which confirms that asynchronous binder calls are initiated.

The code is as follows:

Although callingPid=0, one thing you can tell from callUid=10146 is that com.tencent. Tim: Daemon processes are pulled by a process from the Tim application itself.

5.4 summary

Through the preliminary analysis in front, first sort out the ideas, and have the following preliminary conclusions:

  • 1) TIM has at least 4 processes, all of which are forked by Zygote process, and all of which are pulled by startService;
  • 2) Eliminate the failure of the self-start function of TIM restriction in the security center;
  • 3) Exclude the death callback process of Binder after TIM process is killed and restart the process through Service;
  • 4) Remove the alarm mechanism to pull up the process;
  • 5) From callingPid=0, it can be seen that TIM does not start the service through the startService() interface provided by the common system framework, but in a customized way;
  • 6) From callingUid=10146, it can be concluded that the way TIM saved himself was through TIM itself rather than the system or third-party applications.

It is not difficult to make a guess: first, TIM applications can listen for application process kills, and second, TIM applications can replace or customize a set of Binder calls to actively interact with Binder drivers.

6, In-depth analysis

6.1 Seeking rules

TIM application has 4 processes, repeatedly try to kill TIM after each process, observe since the start of the situation. Tim: com.tencent. Tim :MSF and com.tencent. Tim :MSF were both killed by the Daemon and the other Daemon, and then killed themselves and restarted.

Next, focus on these two processes and start tracing signal processing.

6.2 Analysis from the perspective of Signal

Turn on signal:

root@gityuan:/ # echo 1 > /d/tracing/events/signal/enable

root@gityuan:/ # echo 1 > /d/tracing/tracing_on

Run the following command to grab the tracing log:

root@cancro/: cat/d/tracing/trace_pipe

The log is as follows:

// Run adb shell kill-9 10649 to kill com.tencent. Tim :Daemon

sh-22775 [000] d.. 2 18844.276419: signal_generate: sig=9 errno=0 code=0 comm=cent. Tim :Daemon PID =10649 GRP =1 res=0

Tim :MSF was also killed by Thread thread-89

Thread-89-10712 [000] DN.2 18844.340735: Signal_generate sig=9 errno=0 code=0 comm=tencent.tim:MSF pid=10669 grp=1 res=0

Binder:14682_4-14845 [000] d.. 2 18844.340779: signal_DELIVER: sig=9 Errno =0 code=0 sa_handler=0 sa_FLAGS =0

Binder:14682_1-14694 [000] d.. 2 18844.341418: signal_DELIVER: sig=9 Errno =0 code=0 sa_handler=0 sa_FLAGS =0

Binder:14682_2-14697 [000] d.. 2 18844.345075: signal_DELIVER: sig=9 Errno =0 code=0 sa_handler=0 sa_FLAGS =0

Tencent. Tim :MSF-14682 [000] Dn.2 18844.345115: Signal_DELIVER: sig=9 errno=0 code=0 sa_handler=0 sa_FLAGS =

Tim: The Daemon process was killed by one of the threads, Thread-89, but the name of thread-xxx is clearly an automatic number generated by the system.

Question 3: What are the characteristics of the Thread named “thread-89” in the process? How can I kill the process?

As you can see from the screenshot below, this particular thread of the MSF process is currently performing the flock_lock operation, which is obviously a file lock operation, and this method immediately caught my attention. Daemons also have a thread like this, which is one step closer to the truth.

Let’s look at the call stack:

Cmd line: com.tencent.tim:Daemon

“Thread-89″prio=10 tid=12 Native

| group=”main”sCount=1 dsCount=0 obj=0x32c07460 self=0xf3382000

| sysTid=10712 nice=-8 cgrp=bg_non_interactive sched=0/0handle=0xee824930

| state=S schedstat=( 44972457 14188383 124 ) utm=1 stm=3 core=0 HZ=100

| stack=0xee722000-0xee724000 stackSize=1038KB

| held mutexes=

kernel: __switch_to+0x74/0x8c

kernel: flock_lock_file_wait+0x2a4/0x318

kernel: SyS_flock+0x19c/0x1a8

kernel: el0_svc_naked+0x20/0x28

native: #00 pc 000423d4 /system/lib/libc.so (flock+8)

native: #01 pc 0000195d /data/app/com.tencent.tim-1/lib/arm/libdaemon_acc.so (_Z9lock_filePc+64)

.

native: #29 pc 0000191f /data/app/com.tencent.tim-1/lib/arm/libdaemon_acc.so (_Z9lock_filePc+2)

native: #30 pc 0000191d /data/app/com.tencent.tim-1/lib/arm/libdaemon_acc.so (_Z9lock_filePc)

native: #31 pc 0000191b /data/app/com.tencent.tim-1/lib/arm/libdaemon_acc.so (_Z18notify_and_waitforPcS_+102)

.

native: #63 pc 000018d1 /data/app/com.tencent.tim-1/lib/arm/libdaemon_acc.so (_Z18notify_and_waitforPcS_+28)

at com.libwatermelon.WaterDaemon.doDaemon2(Native method)

at com.libwatermelon.strategy.WaterStrategy2$2.run(WaterStrategy2.java:111)

From the name in the call stack for this thread, notify_AND_waitfor reminds me that this is most likely used to listen for files to know if the process is alive. To get a closer look at the mission of this particular thread, there is no need for GDB, but a strace trick should do the trick.

6.3 Analysis using STRACE

root@gityuan:/ # strace -CttTip 22829 -CttTip 22793

The results are as follows:

Flock basics:

Flock is a Linux file lock that locks data integrity when multiple processes are working on the same file at the same time. One of the scenarios flock uses is to detect the existence of processes. Flock is an advisory lock, not a mandatory lock, but a process can directly operate on files that another process is locking with flock. The reason is that flock only checks whether files are being locked, and the kernel does not force other processes to block reads and writes, which is the kernel policy for advisory locking.

Intflock (intfd, intoperation)

The first argument is the file descriptor, and the second argument specifies the lock type, with three optional values:

  • 1) LOCK_SH: shared lock. Multiple processes running at the same time hold this shared lock.
  • 2) LOCK_EX: exclusive lock. Only one process is allowed to hold this lock.
  • 3) LOCK_UN: Removes the lock held by the file in the process.

Tim: the monitoring thread of the MSF process performs an exclusive lock of type flock of LOCK_EX to try to obtain a file that has been assigned to com.tencent. Tim: The MSF process blocks until the lock is released, and once the Daemon is killed, the system reclaims all resources (including files), which is done by the Linux kernel.

When the Daemon files are recycled, flock is released so that MSF can acquire the lock and spit out the “Lock file success” message. MSF learns that the Daemon process has been killed and then executes a line ioctl(11, BINDER_WRITE_READ, 0xFFffffFFEE823ed0) = 0 < 0.0067 >.

TIM itself implements Binder calls to startService, sending BINDER_WRITE_READ ioctl commands to Binder drivers. Then send kill SIGKILL to kill its own MSF process, the same principle can be pulled again.

Analysis to here, see the implementation of the writev operation, should be the Log operation, there is a keyword to Watermelon attracted my attention, search the Watermelon keyword, and sure enough to find a new piece of heaven and earth.

6.4 TIM log

// Old MSF process

24538 24562 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_p2

24538 24562 E Watermelon: Watch >>>>Daemon<<<<< Daed !!

24538 24562 E Watermelon: java_callback:onDaemonDead

24538 24562 V Watermelon: onDaemonDead

24576 24576 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_d1

24576 24576 E Watermelon: Watch >>>>Daemon<<<<< Daed !!

24576 24576 E Watermelon: process exit

// New daemon process

25103 25103 V Watermelon: initDaemon processName=com.tencent.tim:Daemon

25103 25103 E Watermelon: onDaemonAssistantCreate

25134 25134 D Watermelon: start daemon24=/data/user/0/com.tencent.tim/app_bin/daemon2

/ / app_d process

25137 25137 D Watermelon: pipe readdatasize >> 316 <<

25137 25137 D Watermelon: indicator_self_path >> /data/user/0/com.tencent.tim/app_indicators/indicator_d1

25137 25137 D Watermelon: observer_daemon_path >> /data/user/0/com.tencent.tim/app_indicators/observer_p1

25137 25137 I Watermelon: sIActivityManager==NULL

25137 25137 I Watermelon: BpActivityManager init

/ / new daemon

25103 25120 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_p2

25103 25120 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_p2

25137 25137 I Watermelon: BpActivityManager init end

/ / app_d process

25137 25137 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_d1

25137 25137 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_d1

// New MSF process

25119 25119 V Watermelon: initDaemon processName=com.tencent.tim:MSF

25119 25119 V Watermelon: mConfigurations.PERSISTENT_CONFIG.PROCESS_NAME=com.tencent.tim:MSF

25119 25119 E Watermelon: onPersistentCreate

25153 25153 D Watermelon: start daemon24=/data/user/0/com.tencent.tim/app_bin/daemon2

25119 25144 D Watermelon: pipe write len=324

25159 25159 D Watermelon: pipe readdatasize >> 324 <<

25159 25159 D Watermelon: indicator_self_path >> /data/user/0/com.tencent.tim/app_indicators/indicator_p1

25159 25159 D Watermelon: observer_daemon_path >> /data/user/0/com.tencent.tim/app_indicators/observer_d1

25159 25159 I Watermelon: sIActivityManager==NULL

25159 25159 I Watermelon: BpActivityManager init

25119 25144 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_d2

25119 25144 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_d2

25159 25159 I Watermelon: BpActivityManager init end

// All processes enter the listening ready state

25159 25159 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_p1

25159 25159 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_p1

25119 25144 E Watermelon: Watched >>>>OBSERVER<<<< has been ready…

25119 25144 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_p2

25159 25159 E Watermelon: Watched >>>>OBSERVER<<<< has been ready…

25159 25159 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_d1

25137 25137 E Watermelon: Watched >>>>OBSERVER<<<< has been ready…

25137 25137 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_p1

25103 25120 E Watermelon: Watched >>>>OBSERVER<<<< has been ready…

25103 25120 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_d2

Then extract the core fragment from it:

25159 25159 I Watermelon: BpActivityManager init

25119 25144 D Watermelon: start try to lock file>> /data/user/0/com.tencent.tim/app_indicators/indicator_d2

25119 25144 D Watermelon: lock filesuccess >> /data/user/0/com.tencent.tim/app_indicators/indicator_d2

It is not difficult to see:

  • 1) TIM himself obtains the AMS agent BpActivityManager by consulting servicemanager, and then writes the data of startService communication process;
  • 2) TIM uses flock to monitor each other’s process status.
  • There are such as 3) to monitor the file: / data/user / 0 / com. Tencent. Tim/app_indicators/indicator_d2.

6.5 indicator file

Take a closer look at TIM are monitoring under the path/data/user / 0 / com. Tencent. TIM/app_indicators /, found that there are four listening files:

Q4: Why do YOU need 4 Indicator files?

In addition to Daemon and MSF, there are two app_d processes with init as their parent that listen to files:

gityuan@13203:~/gityuan$ adb shell ps-t | grep-i flock

u0_a146 10668 10649 1143304 85876 flock_lock 00f6e1e3d8 S Thread-85

u0_a146 10712 10669 1158552 89664 flock_lock 00f6e1e3d8 S Thread-89

u0_a146 10687 1 12768 564 flock_lock 00f73113d8 S app_d

u0_a146 10717 1 12768 560 flock_lock 00f74353d8 S app_d

Uid =10146; uid=10146;

Once again refresh our understanding of TIM application: it turned out that TIM has 6 processes, among which 2 are attached to the init process and their names have nothing to do with Tencent. I almost missed these two special processes.

The two APP_D processes, which do the same job of listening to each other, should be an alternative. If there is a chance that both Daemon and MSF processes will be killed at the same time, the TIM process can be pulled up through the emergency channel app_d. In a twist, four of the six processes can keep each other alive to ensure TIM’s immortality.

Question 5: What are the four processes reaching and how do they listen to each other?

By constantly analyzing the laws and characteristics before and after being killed and restarting, the relationship diagram between process and listening file is obtained:

Further revealing the veil, the conclusion is as follows:

  • 1) Daemon and MSF processes wait for locks held by each other, and app_D processes wait for locks held by each other;
  • 2) If the app_d1 process is killed, app_D2 starts the MSF process by pulling up DaemonMsfService, and then is killed.
  • 3) When the app_d2 process is killed, App_D1 starts the Daemon process by starting DaemonAssistService and then is killed.
  • 4) Two Daemon and MSF processes: if one of them is killed, the other will observe and start the other process by pulling up the service, and then be killed. Then both app_d processes restart.

In addition, the two processes monitoring indicator_P1 and indicator_P2 are related, and the processes of indicator_D1 and indicator_D2 are related, which will be verified later.

A new question arises: MSF can detect this event via flock, but how does app_d know about it? Why did APP_D commit suicide and restart again after knowing that?

6.6 Analysis from the perspective of Cgroup

root@gityuan:/acct/uid_10146/pid_10649# cat cgroup.procs

10649 //Daemon

10687 //app_d

root@gityuan:/acct/uid_10146/pid_10669# cat cgroup.procs

10669 //MSF

10717 //app_d

To find out more about TIM’s deeper associations, by looking at cgroups, Daemon and app_D1 belong to the same group, MSF and app_d2 belong to the same group.

Question 6: How exactly is app_d created? How does it become a child of init?

Take a look at this from the perspective of process creation and exit:

//5170 (MSF process) –> 5192 –> 5201(exit) –> 5211 (live)

tencent.tim:MSF-5170 [001] … 1 55659.446062: sched_process_fork: comm= Tencent. Tim :MSF pid=5170 child_comm= Tencent. Tim :MSF child_pid=519

Thread-300-5192 [000] … 1 55659.489621: sched_process_fork: comm= thread-300 pid=5192 child_comm= thread-300 child_pid=5201

<… > – 5201 [003]… 1:55659.501074 sched_process_exec: filename = / data/user / 0 / com. Tencent. Tim/app_bin/old_pid daemon2pid = 5201 = 5201

daemon2-5201 [009] … 1 55659.533492: sched_process_fork: comm=daemon2 pid=5201 child_comm=daemon2 child_pid=5211

daemon2-5201 [009] … 1 55659.535169: sched_process_exit: comm=daemon2 pid=5201 prio=120

daemon2-5201 [009] d.. 3 55659.535341: signal_generate: SIG =17 ERRno =0 code=262145 COMM = thread-300 PID =5192 GRP =1 RES =1

Note: One of the app_d processes is forked twice by the MSF process, and the parent exits, thus becoming an orphan process and then being orphaned to the init process, as guaranteed by the Linux process mechanism. Similarly, another app_d process is forked by a Daemon process. At this point, it’s finally clear where app_d comes from. App_d is a cgroup association that lets you know what the Daemon process is doing. The reason for the reboot is to re-establish an interactive relationship.

Q7: How does killing a single daemon kill the app_d process?

Answer: killProcessGroup() is used to kill the app_d process. Adb can kill the app_d process by killing -9 PID. This is due to execution of killProcessGroup() in the process of binderDied() when the daemon is killed and the death callback comes back.

For those of you who have studied the Death callback mechanism with Binder at the Linux kernel level, a new question arises.

Q8: App_D is forked indirectly by the daemon and shares binder FDS, so the death callback is not triggered even if the daemon is killed. How is this triggered?

Answer: Since the exec() function is executed immediately after the app_d process is forked, there is a very important flag O_CLOEXEC when the Binder driver is turned on by ProcessState.

The file descriptor is automatically closed when the exec () function is successfully called by the newly created process.

6.7 Examine the root of the problem

9. What level of modification did TIM make to the Binder framework? Since callingPid=0, is there any way to know who is pulling whom?

Having said that, TIM forced changes to ONEWAY’s approach. You can remove the flags, for debugging purposes, for TIM, with code=34(START_SERVICE_TRANSACTION), and change the flag case:

The IPCThreadState. CPP code has been modified to control all changes made by TIM.

TIM actively queries AMS with ServiceManager at the Java layer and Native layer respectively, obtains BpActivityManager proxy objects, and then continues to interact with Binder drivers using IPCThreadState in the framework. Did not replace libbinder.so.

Binder interaction code can be completely customized, similar to servicemanager, without using IPCThreadState framework communication code. Ioctl () can be wrapped itself, directly talkWithDriver. TIM preservation has room to improve, providing a survival variant so that the above debugging code does not intercept the process of changing flags to ONEWAY. Even so, everything is in Control and can be intercepted and repositioned in Binder drivers. No matter how advanced the game is, it is mainly played in user mode, while the kernel mode is relatively safe.

In addition, by adding the above temporary code and comparing the experiments for many times, the following diagram can be obtained:

A second fork is a process that belongs to the same group as the original process and has a cascading kill relationship.

  • 1) To kill the Daemon, MSF will watch and pull up the Daemon; At the same time app_d1 is killed because of the same group, app_d2 is observed to also start Daemon. This is double insurance.
  • 2) Kill the app_d1 process, then the app_d2 process will observe and pull the MSF process;
  • 3) If the force-stop process is executed directly, all 6 processes will be killed. However, not all processes will be killed at the same time.

6.8 Analysis and induction

Let’s review the process above:

  • 1) The possibility of some routine routines was excluded in the preliminary analysis, and the abnormal behavior of callingPid=0 was detected;
  • 2) Along the clues, continue to repeatedly try to kill the process, looking for more rules, constantly put forward questions to themselves;
  • 3) Answer questions with signal, Strace, traces, PS, Binder, Linux, kill, etc.

Solving the problems at the system level is more like the feeling of detective solving a case. We need to have a keen sense of smell, grasp the clues, and “make a bold guess and carefully verify”, eventually we can find the truth of the case. This so-called “points move into lines, lines move into planes, planes move into bodies”, from sporadic bits and pieces to outline the all-round three-dimensional truth.

In summary, these doubts are mainly raised:

  • Question 1: The security center has been configured to prohibit TIM self-start, and both the security center and Whetstone have strict restrictions on self-start and cascading processes.
  • Question 2: How can the startService() callingPid be equal to 0?
  • Question 3: What are the characteristics of the Thread named “thread-89” in the process? How can I kill the process?
  • Q4: Why do YOU need 4 Indicator files?
  • Question 5: What are the four processes reaching and how do they listen to each other?
  • Question 6: How exactly is app_d created? How does it become a child of init?
  • Q7: How does killing a single daemon kill the app_d process?
  • Q8: App_D is forked indirectly by the daemon and shares binder FDS, so the death callback is not triggered even if the daemon is killed. How is this triggered?
  • 9. What level of modification did TIM make to the Binder framework? Since callingPid=0, is there any way to know who is pulling whom?

7. Conclusion of this paper

By summarizing TIM’s survival techniques, we can draw the following experience:

  • 1) Use flock’s file-exclusive lock to listen for process survival
  • 1.1) First, a pair of common processes Daemon and MSF are used to listen to each other’s files to obtain whether the other process is alive or not;
  • 1.2) At the same time, a pair of app_D processes that are isolated to init process are used to monitor each other’s files to obtain whether the other process is alive or not; Both processes are forked indirectly by Daemon and MSF processes; Double insurance.
  • 2) The Binder framework code of startService in the system framework is not adopted, but the BpActivityManager agent object is obtained by querying in the Native layer by itself, and then the startService interface is implemented by itself and modified to be called by ONEWAY Binder. Not only increase the difficulty of analyzing problems, but also further hide their own strategies;
  • 3) When the listener process dies, it uses its own Binder call of StartService to pull up the other process. The system has no interception mechanism for starting the process in this way.

This kind of flock is at least a lot better than the looping method of listening that is common on the Web.

Even more powerful than usual is the fact that TIM has six processes (note: there are also some processes that can be created using processes), four of which form two groups of interactive processes, one of which takes advantage of the Linux process “dependency” principle, which is hidden deep enough to ensure that processes are mutually guaranteed to live forever.

Of course, if all four processes exit at the same time after receiving the signal, there is still a chance that they will be killed.

Binder Calls to start services without using the framework code are also a highlight, but there is a more advanced way to wrap ioctl and interact with the driver directly. For this problem, we have done the anti-survival scheme, and then for the sake of some functions, we let go of the restrictions on this, so we will not continue to expand here.

Appendix: summary of articles on IM/ push process preservation/network preservation

“Application survival (1) : dual-process daemon survival practices under Android6.0”

Android6.0 and above (process prevention)

Android6.0 and above survival practices (killed and resurrected)

How to Keep the Android Process Alive: One Article to Solve All Your Questions

Summary of Message Push on Android: Implementation principle, Heartbeat Survival, Problems encountered, etc.

An in-depth look at the Little Matter of Android News Push

Why Does TCP – based MOBILE IM Still Need heartbeat Keepalive mechanism?

“Wechat Team original Sharing: Android version of wechat background Survival combat Sharing (Process survival)”

“Wechat team original sharing: Android version of wechat backstage Combat Sharing (Network Protection)”

Mobile IM Practice: Implementing intelligent Heartbeat Mechanism of wechat on Android

Mobile IM Practice: Analysis of Heartbeat Strategy of WhatsApp, Line and wechat

Android P is coming soon: The Real Nightmare of Backend Apps and Notifications

A Comprehensive Review of the real Operation Effect of the Current Android Background Preservation Scheme (by 2019)

Understanding the Mechanism of Network Heartbeat Packet in Instant Messaging Applications: Functions, Principles, Implementation ideas, etc.

“Sharing of Rongyun Technology: Practice of Network Link Preservation Technology of Rongyun Android IM Products”

“Correctly understand the IM long connection heartbeat and reconnection mechanism, and start to implement (complete IM source code)”

In 2020, Android backstage survival is still a play? See how I elegantly implement!

“The most powerful Android Survival ideas: In-depth Analysis of Tencent TIM Process Immortality Technology”

>> More similar articles……

(This article is simultaneously published at: www.52im.net/thread-2893…)