Handler: a person who handles something. He is the Android worker, carrying the burden of delivering messages. Many people who first try to read the framework’s source code will probably start with the Handler messaging mechanism, and we’ll find Handler used in many parts of the source code. However, it doesn’t seem to be as simple as we thought, and if we want to dig deeper, we can trace it all the way back to the Linux kernel.
prepare
This article analyzes the source code based on API 29, from the official AOSP:cs.android.com
The question mark
The Handler API is relatively simple to use. We often use the following method to send a Runnable to the Handler thread:
new Handler().post(() -> { /* do something */ });
new Handler().postDelayed(() -> { /* do something after 3s */ }, 3000);
Copy the code
Anonymous classes are used for simplicity, but you can also implement a custom Handler and rewrite the handleMessage method to handle messages. Post actually calls Handler’s sendMessageDelayed method, encapsulates the Runnable into a Message and sends it to MessageQueue, which is retrieved by Looper loop and handed back to Handler for processing. So when a child thread calls the main thread’s Handler to send a message, we have achieved our goal of asynchronously executing the task and then telling the main thread the result.
If you are curious, you will have a question mark: ** How to implement the above delay (today’s focus)? Is ** a timer? Or periodic polling? Event driven? With these questions, you have to go deep into the source code.
Java layer
Handler source code:
public final boolean sendMessageDelayed(@NonNull Message msg, long delayMillis) {
if (delayMillis < 0) {
delayMillis = 0;
}
return sendMessageAtTime(msg, SystemClock.uptimeMillis() + delayMillis);
}
public boolean sendMessageAtTime(@NonNull Message msg, long uptimeMillis) { MessageQueue queue = mQueue; .return enqueueMessage(queue, msg, uptimeMillis);
}
Copy the code
One detail here is that the sendMessageAtTime method is called, and instead of passing delay directly in, it is passed an exact time by adding the current time. The reason for doing this is as follows: If the delay is needed in subsequent calls to the bottom layer of the system, the new current time will be subtracted from the exact uptime to ensure accuracy and reduce error (because it takes a lot of time to call from the application layer to the bottom layer).
The enqueueMessage method of the Handler will call the MessageQueue method of the same name after a little processing. We will keep a brief concept in mind before we understand the source code, that is, when the MessageQueue is empty or the delay of the message header has not expired. The relevant code blocks (but the thread has released CPU resources to sleep) and can only wake up when a new message arrives, which we will explain later. Back to MessageQueue’s enqueueMessage:
boolean enqueueMessage(Message msg, long when) { // When is uptimeMillis.synchronized (this) {... msg.when = when;// The uptimeMillis is assigned to the MSG message object
Message p = mMessages; // Why is mMessages named plural, because it is a linked list structure to store message queues
boolean needWake; // Do I need to wake up and why? We'll talk about that later
if (p == null || when == 0 || when < p.when) {
// Insert message to queue head, wake up if blocked
msg.next = p;
mMessages = msg;
needWake = mBlocked;
} else {
// The following stack (such as isAsynchronous) can be ignored for the moment, just know that a new message is being inserted into the list
needWake = mBlocked && p.target == null && msg.isAsynchronous();
Message prev;
for (;;) {
prev = p;
p = p.next;
// The for loop, combined with the if condition here, ensures that messages are sorted by when
if (p == null || when < p.when) {
break;
}
if (needWake && p.isAsynchronous()) {
needWake = false;
}
}
msg.next = p; // P is the original next node of prev
prev.next = msg; // Update the next node of prev, MSG is inserted into the message queue
}
if (needWake) {
// the mPtr is actually a reference to MessageQueue in the Native layer, which will be discussed laternativeWake(mPtr); }}return true;
}
Copy the code
This code is relatively simple, mainly is to queue new messages in order of delay time, and if necessary, wake up operation. The question that arises is:
- What is this awakening awakening? Why make native method calls?
- The “when” time field is assigned to MSG.
With these questions in mind, Looper is the engine of the entire message pipeline, powered by the loop method:
public static void loop(a) {
final Looper me = myLooper(); // One thread for each Looper.final MessageQueue queue = me.mQueue; // Each Looper corresponds to a message queue.for (;;) {
Message msg = queue.next(); // might block. }... }Copy the code
Looper was lazy enough to call queue next instead of dealing with message delay alone. MessageQueue was the most tired. Of course, That’s not Looper’s job. It just distributes the news.
Take a look at MQ’s next method, and we are about to enter Wonderland:
Message next(a) {
final long ptr = mPtr; // Get the native layer reference of MQ.int nextPollTimeoutMillis = 0; // Poll timeout for the next poll. This is the first time we saw poll
for(;;) {...// [key] Another native method that blocks, this is the primary reason why messages can be delayed
nativePollOnce(ptr, nextPollTimeoutMillis);
synchronized (this) {
// Stop blocking, start fetching and returning the MSG object to Looper
final long now = SystemClock.uptimeMillis();
Message prevMsg = null;
Message msg = mMessages;
if(msg ! =null && msg.target == null) {... }if(msg ! =null) {
if (now < msg.when) {
// Update the blocking timeout before the next message, and the for loop will use it the next time
// Here is a when-now operation in response to what I said above
nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
} else {
// Got a message.
mBlocked = false;
if(prevMsg ! =null) {
prevMsg.next = msg.next;
} else {
mMessages = msg.next;
}
msg.next = null; .returnmsg; }}else {
// No more messages.
nextPollTimeoutMillis = -1; }... }... }}Copy the code
NativePollOnce (PTR, nextPollTimeoutMillis) is a local method that blocks the call. In combination with the previous nativeWake(PTR) method, We can get an idea of how message latency works:
- When using
postDelayed
After a delayed message is sent, the incoming time eventually passesnativePollOnce
Method to block with timeout to achieve the purpose of delay, when the time is up to the end of the blocking,next
Method returns the message object to Looper. - In the MessageQueue
enqueueMessage
Method to determine whether the time of the newly inserted message is less than the time of the queue header message to decide whether to wake up immediately, i.e. passnativeWake
Method to break a block that has not timed out.
The Java layer analysis is pretty much there, but for those of you who are curious, won’t this block consume CPU resources all the time? Could this be the true cause of Android’s power consumption? ? If so, you’re still underestimating Linux. To further understand this Native blocking, we need to dig into the Android system’s Native source code.
Native layer
According to the standard of the source code, we can directly find the MessageQueue CPP code: frameworks/base/core/jni/android_os_MessageQueue nativePollOnce function in the CPP:
static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj, jlong ptr, jint timeoutMillis) {
NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
// pollOnce is actually called
nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}
void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
mPollEnv = env;
mPollObj = pollObj;
mLooper->pollOnce(timeoutMillis); // It turns out Looper is still working at native
mPollObj = NULL;
mPollEnv = NULL; . }Copy the code
Further to find the system/core/libutils/stars. The CPP pollOnce:
int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
int result = 0;
for(;;) {...if(result ! =0) {...return result;
}
// The timeout passes here, and then the pollInner function
result = pollInner(timeoutMillis); }}int Looper::pollInner(int timeoutMillis) {...If the timeout is earlier than the one passed in, delay it at an earlier time to avoid missing MSG
if(timeoutMillis ! =0&& mNextMessageUptime ! = LLONG_MAX) {nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
if (messageTimeoutMillis >= 0
&& (timeoutMillis < 0|| messageTimeoutMillis < timeoutMillis)) { timeoutMillis = messageTimeoutMillis; }}// The initial return value is wake up
intresult = POLL_WAKE; .// Epoll_wait will suspend to release CPU resources.
mPolling = true;
struct epoll_event eventItems[EPOLL_MAX_EVENTS];
The epoll_wait system call is where the entire message mechanism actually blocks, while waiting to read the pipe's notification, as described below
int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
// Epoll_wait returns an event, exits the idle state, and obtains the CPU execution opportunity again
mPolling = false; .// Return -1, error, goto to Done
if (eventCount < 0) {... result = POLL_ERROR;goto Done;
}
// Return 0 to indicate that the timeout period is up
if (eventCount == 0) {
result = POLL_TIMEOUT;
goto Done;
}
// Return eventCount > 0, indicating that a new event was written to the pipe before timeout
for (int i = 0; i < eventCount; i++) {
int fd = eventItems[i].data.fd;
uint32_t epollEvents = eventItems[i].events;
if (fd == mWakeEventFd.get()) {
if (epollEvents & EPOLLIN) {
awoken(a);// 【 key 】 to be awakened
} else{... }}else{... } } Done: ; .return result;
}
Copy the code
There’s a long pollInner code, but the core of the system call is the epoll_wait function, and we’ll take a look at its function definition (epoll_wait(2) — Linux manual Page, epoll-wiki).
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Copy the code
Epoll_wait This is also where the entire Android messaging mechanism actually blocks. The blocking wait ensures that threads go to sleep, do not consume CPU resources, and listen for registered events.
For (int I = 0; for (int I = 0; i < eventCount; I++). So this epoll should be an event-driven mechanism.
Why not rush into the awoken() function above, because before we get there, we need to have a basic understanding of Linux’s file descriptor (FD), pipe, and epoll mechanisms, so we can understand everything.
Introduction to Kernel Knowledge
File Descriptor
The following sections are referred to as FD. Excerpts from Wikipedia concepts:
A computer science term is an abstract concept used to describe a reference to a file. The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table of open files that the kernel maintains for each process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process.
The concept seems a bit abstract, but before we can understand FD we need to understand the design philosophy of Linux: everything is a file. “Everything in Linux can be treated as a file, including regular files, linked files, sockets, device drivers, etc., and operations on them may create a corresponding file descriptor. File descriptors are indexes created by the kernel to efficiently manage opened files. They are used to refer to opened files. All system calls (such as read and write) related to file I/O operations are required through file descriptors.”
As you can see, FD is a valuable system resource in Linux, like oil in the industrial age, without which our file systems would not function. In essence, when a Linux process is started, it generates a file descriptor Table (FD Table) in the kernel space, which records all available FDS of the current process. In other words, it maps all open files of the process.
FD is essentially an array subscript (and therefore a non-negative integer) of the file descriptor table. Colloquially, it is the key for the system to operate I/O resources. For more details, please refer to the link at the end of the article.
From this we can also develop a preliminary concept: with easy file manipulation, we can achieve cross-process communication.
Pipe
So what are pipes? Again, let’s look at the concept of an encyclopedia:
In Unix-like operating systems (and some others that borrow this design, such as Windows), a Pipeline is a series of processes that link standard input and output, in which the output of each process is taken directly as input to the next process. This concept was invented by Douglas McRoy for the Unix command line and is named for its resemblance to a physical pipe.
While the concept is fairly graphic, pipes are a mechanism commonly used for interprocess communication. As the name suggests, it’s like a pipe that carries water from one end to the other. The inventor of Pipe found that when system operations execute commands, it is often necessary to pass the output of one program to another program for processing. This operation can be implemented using input and output files, such as:
ls > abc.txt Type the list of files in the current directory into ABC text
grep xxx abc.txt # adb text as output, let grep look for the content XXX
Copy the code
This is really cumbersome, but the advent of pipes has simplified this operation, and we can now use pipe characters (as they are commonly called) to connect two commands with vertical lines:
ls | grep xxx
Copy the code
To achieve the same effect, you do not need to explicitly produce the file. The shell uses a pipe to connect the input and output of two processes to enable cross-process communication. Therefore, we can think of the essence of a pipe as a file, which the former process opens by writing, and the latter process opens by reading. So the system call function for the pipe looks like this:
int pipe(int pipefd[2]);
Copy the code
After the call, two file descriptors are created to populate the PIpeFD array, where pipefd[0] is read open and pipefd[1] is write open, respectively, as read and write descriptors for the pipe. Although a pipe is a file, it does not occupy disk storage space itself, but occupies memory space, so a pipe is a buffer of memory that operates in the same way as a file (so we do not have a narrow understanding of the file in Linux, not on disk is called a file). Data written to the pipe is cached until it is read by the other end, so the above command blocks and grep will not be executed until ls has produced a result.
So in practice, we usually have one process close the read side and the other close the write side for simplex communication, quoting a diagram from the other big guys:
For code like this, see the form (see link at the end of this article for more details) :
// The parent process fork generates the child process
/ / the parent process
read(pipefd[0],...). ;/ / the child process
write(pipefd[1],...). ;Copy the code
Furthermore, it is important to note that PIPES are not limited to cross-process communication, but are certainly available within the same process.
epoll
Now that we know about file descriptors and pipes, we can finally talk about the epoll mechanism. Again, let’s look at the definition:
Epoll is an extensible I/O event notification mechanism for the Linux kernel. Debuting in Linux 2.5.44, it is designed to replace the existing POSIX Select (2) and poll(2) system functions, enabling better performance for programs that require a lot of manipulation of file descriptors. Epoll implements a similar function to Poll in that it listens for events on multiple file descriptors. Epoll searches for monitored file descriptors using red-black trees (RB-trees). When an event is registered on an ePoll instance, ePoll adds the event to the red-black tree of the ePoll instance and registers a callback function to add the event to the ready list when it occurs.
Epoll is an I/O event notification mechanism (event-driven, watch-like mode). We mentioned a pipeline mechanism that requires one end to write data and the other end to read data, but in practice, we often don’t want to wait forever. We want a listener to tell me when you write and then let me read.
Before the emergence of epoll, there were also monitoring mechanisms such as SELECT and poll, but their efficiency was relatively low. Some of them needed to traverse FD indiscriminately, polling I/O after waking up even though they were non-blocking, or they had disadvantages such as upper limit of FD monitoring, which will not be described here.
In summary, ePoll addresses these issues, enabling high-performance I/O multiplexing, and also using MMAP to accelerate messaging between the kernel and user space. Epoll system calls are also relatively simple, with just three functions:
Create an epoll instance in the kernel and return an epoll file descriptor
int epoll_create(int size);
int epoll_create1(int flags);
// Add, modify, or delete listening for events on fd (the third argument) to the epoll instance corresponding to EPfd (create above)
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
// This function, which we mentioned above, waits for events registered on epFD to be carried out from the events argument
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
Copy the code
The internal implementation of epoll is still relatively complex. Red-black tree structure is used to manage FDS and two-way linked list structure is used to manage callback events. For details, please refer to the link at the end of this article. The epoll_ctl function stores the FDS and events passed in and establishes a relationship with the corresponding device driver. When the corresponding event occurs, the internal callback function will be called to add the event to the linked list. Finally, the thread will be notified to wake up and epoll_wait can be returned. When no event occurs, epoll_wait is suspended.
Not to be confused here: epFD is the descriptor or index of an epoll instance; Fd is the descriptor corresponding to the event you want to listen on, and ultimately the read and write pipe depends on this FD.
Back in the Native layer
Now that epoll is clear, let’s go back to MessageQueue and Looper’s Native source code, and it becomes very clear.
Remember that the mPtr variable in MQ is actually initialized in the constructor of MQ:
MessageQueue(boolean quitAllowed) {
mQuitAllowed = quitAllowed;
mPtr = nativeInit(); // Local call
}
Copy the code
So there is also an MQ object in the Native layer. MPtr is a mapping reference of MQ in the Native layer, which is easy to access by the upper layer:
static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue(a); .return reinterpret_cast<jlong>(nativeMessageQueue); // long, which is actually an address
}
NativeMessageQueue::NativeMessageQueue() : mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
mLooper = Looper::getForThread(a);if (mLooper == NULL) {
mLooper = new Looper(false); // Native MQ also creates an instance of Looper at initialization
Looper::setForThread(mLooper); }}Copy the code
Take a look at Looper’s initialization:
Looper::Looper(bool allowNonCallbacks)
...
// The eventfd system function creates a file descriptor and assigns the value to mWakeEventFd, on which all pipe reads and writes will take place
mWakeEventFd.reset(eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC)); .// create an epoll instance
rebuildEpollLocked(a); }void Looper::rebuildEpollLocked(a) {
// If there is already a descriptor for the epoll instance, reset it first
if (mEpollFd >= 0) {... mEpollFd.reset(a); }// Create a new epoll instance and wake pipe
Epoll_create1 is used to create the instance and assign the return value to the mEpollFd descriptor
mEpollFd.reset(epoll_create1(EPOLL_CLOEXEC));
struct epoll_event eventItem;
memset(& eventItem, 0.sizeof(epoll_event));
eventItem.events = EPOLLIN; // IN, listening for the pipe's input (that is, write) operation
// The wake up event fd is held by the event data until the match is found later
eventItem.data.fd = mWakeEventFd.get(a);EPOLL_CTL_ADD = EPOLL_CTL_ADD = EPOLL_CTL_ADD
int result = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, mWakeEventFd.get(), &eventItem); . }Copy the code
The epoll_create1, epoll_ctl, and epoll_wait epoll_wait processes are connected again as follows:
int Looper::pollInner(int timeoutMillis) {...The epoll_wait system call is where the entire message mechanism actually blocks, while waiting to read the pipe's notification
int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis); .// Return eventCount > 0, indicating that a new event was written to the pipe before timeout
for (int i = 0; i < eventCount; i++) {
int fd = eventItems[i].data.fd;
uint32_t epollEvents = eventItems[i].events;
if (fd == mWakeEventFd.get()) { // Find the match wake up event
if (epollEvents & EPOLLIN) {
awoken(a);// 【 key 】 to be awakened
} else{... Done: ; .return result;
}
void Looper::awoken(a) {
uint64_t counter;
// Perform read operations
TEMP_FAILURE_RETRY(read(mWakeEventFd.get(), &counter, sizeof(uint64_t)));
}
Copy the code
The awoken (awoken) function is actually a read on a pipe. The epoll event driver woke up to read the awoken (awoken) function, and the epoll event driver woke up to read the awoken.
void Looper::wake(a) {
uint64_t inc = 1;
// Write a 1 in the wake pipe to trigger the wake pipe
ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd.get(), &inc, sizeof(uint64_t))); . }Copy the code
As an additional reminder, older versions of the source code create wake event descriptors like this:
int wakeFds[2];
int result = pipe(wakeFds);
mWakeReadPipeFd = wakeFds[0];
mWakeWritePipeFd = wakeFds[1];
Copy the code
Read and write are two descriptors, and only one mWakeEventFd descriptor is used in the latest system source code, probably because the Handler message mechanism does not require cross-process, which remains to be explored.
The Looper::wake() function is called from MQ’s nativeWake function:
static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
nativeMessageQueue->wake(a);// a further call to Looper::wake() will be made internally.
}
Copy the code
Recall above, nativeWake(mPtr); In the Java layer it is called by the enqueueMessage method when the message is queued. At this point, we’ve finally covered the entire process, from the Java layer all the way to the Linux kernel system call.
To summarize, back to the headline, why no Epoll no Handler. Handler + Looper + MessageQueue can process delayed messages and achieve event-driven effect without occupying CPU resources. In essence, the Native layer uses the Epoll I/O event notification mechanism of the Linux kernel. Two scenarios are satisfied:
-
When delayed messages are sent via postDelayed, the incoming time will eventually be blocked with a timeout through the nativePollOnce method, essentially due to the suspension of the epoll_wait function, which achieves the purpose of delay.
Epoll_wait returns, and the thread is woken up to obtain CPU resources. If the number of epoll events (event_count) is 0, nativePollOnce returns directly. MQ’s next method, in turn, returns the message object to the upper Looper.
-
The enqueueMessage method of MessageQueue determines whether the time of the newly inserted message is less than the time of the queue header message to decide whether to wake up immediately, that is, to break the blocking that has not expired through the nativeWake method.
Since epoll listens for the mWakeEventFd wakeup event descriptor, epoll_WAIT completes the pending state and returns events greater than 0. Awoken is then called. Finally, nativePollOnce returns result as POLL_WAKE, and the upper-layer message processing continues. Because we know Looper’s loop is always calling Next, if the bottom layer doesn’t wake up, the top layer will block.
So either timeout has reached an automatic wake up or an active wake up due to new message insertion. Use a loose flow chart to describe the process of active arousal:
The latter
The main thread blocks in the nativePollOnce method in queue.next() and frees CPU resources to sleep without consuming a lot of CPU resources. Even in the foreground, as long as your UI doesn’t have animations or touch interactions, it’s pretty much the same. This also answers the question of why Looper is doing an infinite loop without causing abnormal power consumption.
How do Linux system calls hang without consuming CPU time slices, and how are CPU timers and interrupts implemented? That may involve hardware knowledge, good review composition principle ha ha ha. I will add more information later when I have time.
In fact, it was not until the end of the study that I realized why epoll was called epoll, because the biggest difference between epoll and the original poll mechanism is that it has been improved to be driven by event. If I guess correctly, this E should stand for event.
reference
- Research on Native layer of Handler in Android
- What exactly is a Linux file descriptor?
- Understand Linux file descriptors FD and Inode
- Description of File Descriptor FD (File Descriptor)
- Interprocess communication for Linux: Pipes
- Linux pipe command (PIPE)
- Linux I/O multiplexing and epoll details
- Asynchronous blocking IO — epoll
- In the Handler epoll