Think twice series: Android messaging mechanism, a thorough

Think twice series is my latest form of learning and summary, focusing on: problem analysis, technical accumulation, vision expansion, about think twice series

This time, you can really read through:

Design of message queues in the Java layer
Java layer Looper distribution
The relationship between the Native layer message queue and the Java layer message queue
Looper distribution at the Native layer
The message
epoll

preface

One of the most important mechanisms in Android has been analyzed for more than a decade, and a great deal of content has been mined. So:

This mechanism has been comparedfamiliarityThe reader, in this article, cannot seeSomething new.
Readers who are not yet familiar with the mechanics of messaging can continue to dig on the basis of the article.

However, after a simple search and analysis, most of the articles revolve around:

Handler, Looper, MQ
Upper-layer Handler, Looper, MQ source analysis

. Learning from these perspectives alone is not enough to fully understand message mechanisms.

The essence of this article is a brain explosion, to avoid brain explosion, and to help readers understand the context of the content. Let’s release the brain map:

Brain burst: OS solves interprocess communication problems

In the program world, there are a lot of communication scenarios. Searching our knowledge, there are several ways to solve interprocess communication problems:

This paragraph of content can be extensively read, understand on the line, do not affect to read

The pipe

Pipe: a half-duplex mode of communication in which data flows only in one direction and can only be used between related processes.

Command flow pipe s_pipe: full-duplex, which can transmit in both directions simultaneously

Named pipe FIFO: a half-duplex communication mode that allows communication between unrelated processes.

MessageQueue MessageQueue:

A linked list of messages, stored in the kernel and identified by a message queue identifier. Message queue overcomes the disadvantages such as little information transmitted by signal, the pipe can only carry the plain byte stream and the limited buffer size.

SharedMemory:

Map a section of memory that can be accessed by other processes. This section of shared memory is created by one process but can be accessed by multiple processes. Shared memory is the fastest IPC method, and it is specially designed for the inefficiency of other inter-process communication methods. Often used in conjunction with other communication mechanisms, such as semaphores, to achieve synchronization and communication between processes.

Semaphore:

Is a counter that can be used to control access to a shared resource by multiple processes. It is often used as a locking mechanism to prevent other processes from accessing a shared resource when a process is accessing the shared resource, enabling the process to exclusively access the resource. Therefore, it is mainly used as a means of synchronization between processes and between threads within the same process.

Socket Socket:

Unlike other communication mechanisms, it enables process communication between different machines over a network.

Signal signal:

Notifies the receiving process that an event has occurred. The mechanics are complicated.

As we can imagine, there are also a number of interprocess communication scenarios between Android, where the OS must adopt at least one mechanism to enable interprocess communication.

On closer inspection, it turns out that The Android OS does it in more than one way. Furthermore, Android has developed binders based on OpenBinder for interprocess communication in user space.

Why not just use the existing interprocess communication in Linux

This article also briefly discusses “message queues in kernel space”

Here’s a question we’ll explore later:

Does Android use MessageQueue in the Linux kernel to do things

The design of message mechanism based on message queue has many advantages, and Android has adopted this design idea in many communication scenarios.

Three elements of the message mechanism

Wherever we talk about messaging mechanisms, there are three elements:

The message queue
Message loop (distribution)
Message processing

A message queue is a queue of message objects, and the basic rule is FIFO.

Message circulation (distribution) is basically a general mechanism, which uses an infinite loop to continuously fetch messages from the head of the message queue and distribute them for execution

Message processing, it has to be mentioned here that messages come in two forms:

The Enrichment itself has complete information
Query-back Information is incomplete and needs to be checked Back

The trade-off between the two mainly depends on the game between the cost of generating messages and the cost of checking back information.

When the information is complete, the receiver can process the message.

Message queues in the Android Framework

There are two message queues in the Android Framework:

Java layerframeworks/base/core/java/android/os/MessageQueue.java
Native layerframeworks/base/core/jni/android_os_MessageQueue.cpp

MQ for the Java layer is not an implementation of a data structure in the Jdk such as List or Queue.

I downloaded a copy of Android 10 source code, not long, you can read the full read.

It is not difficult to understand: user space will receive messages from kernel space, as shown in the figure below, this part of the message is first learned by the Native layer, so:

throughNative layerSet up a message queue, which has the basic capabilities of a message queue

usingJNIGet throughJava layer 和 Native layer 的 The Runtime barrierIn the Java layermappingOutgoing message queue

Applications are built on top of the Java layer, where messages are implementeddistribution 和 To deal with

PS: In the Era of Android 2.3, message queue implementation is in the Java layer, as to why 10 years ago changed to native implementation, speculation and CPU idling, the author did not continue to explore, if there are readers to understand, I hope you can leave a message to help me.

PS: And a classicSystem startup architecture diagramI can’t find it. This is more intuitive

Code parsing

We simply read and analyze the MQ source code in Native

Native layer message queue creation:

static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
    NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue(a);if(! nativeMessageQueue) {jniThrowRuntimeException(env, "Unable to allocate native queue");
        return 0;
    }

    nativeMessageQueue->incStrong(env);
    return reinterpret_cast<jlong>(nativeMessageQueue);
}
Copy the code

Simply create a Native layer message queue, throw an exception message and return 0 if the creation fails, otherwise convert the pointer to a Java long value and return it. Of course, it will be held by MQ in the Java layer.

Constructor of the NativeMessageQueue class

NativeMessageQueue::NativeMessageQueue() :
        mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    mLooper = Looper::getForThread(a);if (mLooper == NULL) {
        mLooper = new Looper(false);
        Looper::setForThread(mLooper); }}Copy the code

Looper here is Looper in the native layer. The object instance is obtained through the static method Looper::getForThread(). If the object instance is not obtained, the instance is created and set through the static method.

Take a look at the Native methods used in the Java layer MQ

class MessageQueue {
    private long mPtr; // used by native code

    private native static long nativeInit(a);

    private native static void nativeDestroy(long ptr);

    private native void nativePollOnce(long ptr, int timeoutMillis); /*non-static for callbacks*/

    private native static void nativeWake(long ptr);

    private native static boolean nativeIsPolling(long ptr);

    private native static void nativeSetFileDescriptorEvents(long ptr, int fd, int events);
}
Copy the code

Corresponding signature:

static const JNINativeMethod gMessageQueueMethods[] = {
    /* name, signature, funcPtr */
    { "nativeInit"."()J", (void*)android_os_MessageQueue_nativeInit },
    { "nativeDestroy"."(J)V", (void*)android_os_MessageQueue_nativeDestroy },
    { "nativePollOnce"."(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
    { "nativeWake"."(J)V", (void*)android_os_MessageQueue_nativeWake },
    { "nativeIsPolling"."(J)Z", (void*)android_os_MessageQueue_nativeIsPolling },
    { "nativeSetFileDescriptorEvents"."(JII)V",
            (void*)android_os_MessageQueue_nativeSetFileDescriptorEvents },
};
Copy the code

MPtr is a mapping of MQ memory addresses in the Native layer to the Java layer.

The Java layer determines if MQ is still working:

private boolean isPollingLocked(a) {
    // If the loop is quitting then it must not be idling.
    // We can assume mPtr ! = 0 when mQuitting is false.
    return! mQuitting && nativeIsPolling(mPtr); }Copy the code

static jboolean android_os_MessageQueue_nativeIsPolling(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    return nativeMessageQueue->getLooper() - >isPolling(a); }Copy the code

/** * Returns whether this looper's thread is currently polling for more work to do. * This is a good signal that the loop is still alive rather than being stuck * handling a callback. Note that this method is intrinsically racy, since the * state of the loop can change before you get the result back. */
bool isPolling(a) const;
Copy the code

Wake up Native layer MQ:

static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->wake(a); }void NativeMessageQueue::wake(a) {
    mLooper->wake(a); }Copy the code

Native layer Poll:

static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj, jlong ptr, jint timeoutMillis) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}

void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
    mPollEnv = env;
    mPollObj = pollObj;
    mLooper->pollOnce(timeoutMillis);
    mPollObj = NULL;
    mPollEnv = NULL;

    if (mExceptionObj) {
        env->Throw(mExceptionObj);
        env->DeleteLocalRef(mExceptionObj);
        mExceptionObj = NULL; }}Copy the code

How does a Looper at the Native layer distribute messages

//Looper.h

int pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData);
inline int pollOnce(int timeoutMillis) {
    return pollOnce(timeoutMillis, NULL.NULL.NULL);
}

/ / implementation

int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    int result = 0;
    for (;;) {
        while (mResponseIndex < mResponses.size()) {
            const Response& response = mResponses.itemAt(mResponseIndex++);
            int ident = response.request.ident;
            if (ident >= 0) {
                int fd = response.request.fd;
                int events = response.events;
                void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE
                ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
                        "fd=%d, events=0x%x, data=%p".this, ident, fd, events, data);
#endif
                if(outFd ! =NULL) *outFd = fd;
                if(outEvents ! =NULL) *outEvents = events;
                if(outData ! =NULL) *outData = data;
                returnident; }}if(result ! =0) {
#if DEBUG_POLL_AND_WAKE
            ALOGD("%p ~ pollOnce - returning result %d".this, result);
#endif
            if(outFd ! =NULL) *outFd = 0;
            if(outEvents ! =NULL) *outEvents = 0;
            if(outData ! =NULL) *outData = NULL;
            return result;
        }

        result = pollInner(timeoutMillis); }}Copy the code

The stranded Response of the Native layer is handled first, and pollInner is called. The details here are complicated, and we will conduct brain burst in the analysis of Native Looper later.

Before we get into the details here, we know that when a method is called, it is blocked, which in plain English means that the caller is waiting until the method returns.

Native void nativePollOnce(long PTR, int timeoutMillis); The process is blocked.

At this point, let’s read the message fetching of MQ at the Java layer again: the code is longer, and the main points are commented directly in the code.

Before we look at it, let’s think about the main scenarios purely from a TDD perspective: of course, they don’t necessarily fit with Android’s existing design

Whether the message queue is working
- At work, a message is expected back
- No, null is expected
Message queues at workThe currentIs there any message?
- No message, block or return null? If null is returned, it is needed externallyStay idleorWake up the mechanism ofTo support normal operations. From the perspective of encapsulation, shouldStay idleAnd solve the problem by yourself
- There is news
  - specialInternal functional messageIs expected to be handled internally by MQ
  - Message that has reached processing time, return message
  - Before the processing time,If it’s all sorted,Idling remains blocked or Return to silence and set wake up? Expectations, as discussed earlierStay idle

class MessageQueue {
    Message next(a) {
        // Return here if the message loop has already quit and been disposed.
        // This can happen if the application tries to restart a looper after quit
        // which is not supported.
        // 1. If the native message queue pointer map is 0, that is, a virtual reference, it indicates that the message queue has exited and there is no message.
        // Returns null
        final long ptr = mPtr;
        if (ptr == 0) {
            return null;
        }

        int pendingIdleHandlerCount = -1; // -1 only during first iteration
        int nextPollTimeoutMillis = 0;
        
        // 2. Infinite loop, when to get a message that needs to 'distribute processing', keep idle
        for (;;) {
            if(nextPollTimeoutMillis ! =0) {
                Binder.flushPendingCommands();
            }

            // 3. Call the native layer method, pollMessage, notice that the message still exists in the native layer
            nativePollOnce(ptr, nextPollTimeoutMillis);

            synchronized (this) {
                // Try to retrieve the next message. Return if found.
                final long now = SystemClock.uptimeMillis();
                Message prevMsg = null;
                Message msg = mMessages;
                
                //4. If a barrier is found, look for the next possible asynchronous message in the queue
                if(msg ! =null && msg.target == null) {
                    // Stalled by a barrier. Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while(msg ! =null && !msg.isAsynchronous());
                }
                
                if(msg ! =null) {
                    // 5.
                    // If the message has not arrived at the agreed time, set a maximum time difference for 'next wake up'
                    // Otherwise 'maintain singly linked list information' and return message
                    
                    if (now < msg.when) {
                        // Next message is not ready. Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // The 'time to process' message was found. 'Maintain singly linked list information' and return message
                        // Got a message.
                        mBlocked = false;
                        if(prevMsg ! =null) {
                            prevMsg.next = msg.next;
                        } else {
                            mMessages = msg.next;
                        }
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        returnmsg; }}else {
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }

                // Processes whether the message queue needs to be stopped
                // Process the quit message now that all pending messages have been handled.
                if (mQuitting) {
                    dispose();
                    return null;
                }

                // Maintain the IDLEHandler information that needs to be processed next,
                // If there is no IDLEHandler, then go straight to the next round of message fetching
                // Otherwise handle IDLEHandler
                // If first time idle, then get the number of idlers to run.
                // Idle handles only run if the queue is empty or if the first message
                // in the queue (possibly a barrier) is due to be handled in the future.
                if (pendingIdleHandlerCount < 0
                        && (mMessages == null || now < mMessages.when)) {
                    pendingIdleHandlerCount = mIdleHandlers.size();
                }
                if (pendingIdleHandlerCount <= 0) {
                    // No idle handlers to run. Loop and wait some more.
                    mBlocked = true;
                    continue;
                }

                if (mPendingIdleHandlers == null) {
                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
                }
                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
            }

            / / IDLEHandler processing
            // Run the idle handlers.
            // We only ever reach this code block during the first iteration.
            for (int i = 0; i < pendingIdleHandlerCount; i++) {
                final IdleHandler idler = mPendingIdleHandlers[i];
                mPendingIdleHandlers[i] = null; // release the reference to the handler

                boolean keep = false;
                try {
                    keep = idler.queueIdle();
                } catch (Throwable t) {
                    Log.wtf(TAG, "IdleHandler threw exception", t);
                }

                if(! keep) {synchronized (this) { mIdleHandlers.remove(idler); }}}// Reset the idle handler count to 0 so we do not run them again.
            pendingIdleHandlerCount = 0;

            // While calling an idle handler, a new message could have been delivered
            // so go back and look again for a pending message without waiting.
            nextPollTimeoutMillis = 0; }}}Copy the code

Java laminates the message

This is simpler when the message itself is valid and the message queue is still working. From the perspective of TDD:

If the message queue does not have a header, the expectation is directly used as a header
If there is a head
- Message processing timebeforeFirst messageOr messages that need to be processed immediately, as the new header
- Otherwise, in accordance with theThe processing timeInsert it into place

 boolean enqueueMessage(Message msg, long when) {
        if (msg.target == null) {
            throw new IllegalArgumentException("Message must have a target.");
        }

        synchronized (this) {
            if (msg.isInUse()) {
                throw new IllegalStateException(msg + " This message is already in use.");
            }

            if (mQuitting) {
                IllegalStateException e = new IllegalStateException(
                        msg.target + " sending message to a Handler on a dead thread");
                Log.w(TAG, e.getMessage(), e);
                msg.recycle();
                return false;
            }

            msg.markInUse();
            msg.when = when;
            Message p = mMessages;
            boolean needWake;
            if (p == null || when == 0 || when < p.when) {
                // New head, wake up the event queue if blocked.
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // Inserted within the middle of the queue. Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
                Message prev;
                for (;;) {
                    prev = p;
                    p = p.next;
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr ! = 0 because mQuitting is false.
            if(needWake) { nativeWake(mPtr); }}return true;
    }
Copy the code

There’s a separate brain explosion behind the barrier, and I’ll leave the rest of it alone

Java layer message distribution

In this section, we start with message distribution. We’ve already looked at MessageQueue. Message distribution is the process of constantly pulling messages from MessageQueue and assigning them to handlers. Looper did the job.

We already know that the Native layer also has Looper, but it’s not hard to understand:

Message queue needsbridgeConnect the Java and Native layers
Which only needs toOn its own endTo process its own message queue distribution

So, when we look at message distribution in the Java layer, we look at Looper in the Java layer.

Focus on three main approaches:

Go to work
work
I come home from work

Prepare for work

class Looper {

    public static void prepare(a) {
        prepare(true);
    }

    private static void prepare(boolean quitAllowed) {
        if(sThreadLocal.get() ! =null) {
            throw new RuntimeException("Only one Looper may be created per thread");
        }
        sThreadLocal.set(newLooper(quitAllowed)); }}Copy the code

There are two caveats:

I’m out the door, and I can’t go out until I get in again. Similarly, one Looper per thread is sufficient and there is no need to build another as long as it is alive.
To the person responsible, a Looper serves a Thread, which requiresregistered, representing theA ThreadIt has been served by itself. Use ThreadLocal, because multithreaded access sets, ‘always need to be considered

Instead of competing with each other, we simply separate ThreadLocal from each other and encapsulate ThreadLocal

Work in the loop

Note that the work is distributed and does not need to be handled by yourself

There is noregisteredNaturally, no one could be found to take the job.
Already in the work do not rush, rush will lead to work errors, order problems.
The job is to take it outThe boss— MQ 的 instruction — MessageAnd toResponsible person — HandlerTo process and record the information
007Never sleep,When MQ no longer sends messagesThere’s no work to do. Let’s all go home

class Looper {
    public static void loop(a) {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        if (me.mInLoop) {
            Slog.w(TAG, "Loop again would have the queued messages be executed"
                    + " before this one completed.");
        }

        me.mInLoop = true;
        final MessageQueue queue = me.mQueue;

        // Make sure the identity of this thread is that of the local process,
        // and keep track of what that identity token actually is.
        Binder.clearCallingIdentity();
        final long ident = Binder.clearCallingIdentity();

        // Allow overriding a threshold with a system prop. e.g.
        // adb shell 'setprop log.looper.1000.main.slow 1 && stop && start'
        final int thresholdOverride =
                SystemProperties.getInt("log.looper."
                        + Process.myUid() + "."
                        + Thread.currentThread().getName()
                        + ".slow".0);

        boolean slowDeliveryDetected = false;

        for (;;) {
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }

            // This must be in a local variable, in case a UI event sets the logger
            final Printer logging = me.mLogging;
            if(logging ! =null) {
                logging.println(">>>>> Dispatching to " + msg.target + "" +
                        msg.callback + ":" + msg.what);
            }
            // Make sure the observer won't change while processing a transaction.
            final Observer observer = sObserver;

            final long traceTag = me.mTraceTag;
            long slowDispatchThresholdMs = me.mSlowDispatchThresholdMs;
            long slowDeliveryThresholdMs = me.mSlowDeliveryThresholdMs;
            if (thresholdOverride > 0) {
                slowDispatchThresholdMs = thresholdOverride;
                slowDeliveryThresholdMs = thresholdOverride;
            }
            final boolean logSlowDelivery = (slowDeliveryThresholdMs > 0) && (msg.when > 0);
            final boolean logSlowDispatch = (slowDispatchThresholdMs > 0);

            final boolean needStartTime = logSlowDelivery || logSlowDispatch;
            final boolean needEndTime = logSlowDispatch;

            if(traceTag ! =0 && Trace.isTagEnabled(traceTag)) {
                Trace.traceBegin(traceTag, msg.target.getTraceName(msg));
            }

            final long dispatchStart = needStartTime ? SystemClock.uptimeMillis() : 0;
            final long dispatchEnd;
            Object token = null;
            if(observer ! =null) {
                token = observer.messageDispatchStarting();
            }
            long origWorkSource = ThreadLocalWorkSource.setUid(msg.workSourceUid);
            try {
                // Notice here
                msg.target.dispatchMessage(msg);
                if(observer ! =null) {
                    observer.messageDispatched(token, msg);
                }
                dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
            } catch (Exception exception) {
                if(observer ! =null) {
                    observer.dispatchingThrewException(token, msg, exception);
                }
                throw exception;
            } finally {
                ThreadLocalWorkSource.restore(origWorkSource);
                if(traceTag ! =0) { Trace.traceEnd(traceTag); }}if (logSlowDelivery) {
                if (slowDeliveryDetected) {
                    if ((dispatchStart - msg.when) <= 10) {
                        Slog.w(TAG, "Drained");
                        slowDeliveryDetected = false; }}else {
                    if (showSlowLog(slowDeliveryThresholdMs, msg.when, dispatchStart, "delivery",
                            msg)) {
                        // Once we write a slow delivery log, suppress until the queue drains.
                        slowDeliveryDetected = true; }}}if (logSlowDispatch) {
                showSlowLog(slowDispatchThresholdMs, dispatchStart, dispatchEnd, "dispatch", msg);
            }

            if(logging ! =null) {
                logging.println("<<<<< Finished to " + msg.target + "" + msg.callback);
            }

            // Make sure that during the course of dispatching the
            // identity of the thread wasn't corrupted.
            final long newIdent = Binder.clearCallingIdentity();
            if(ident ! = newIdent) { Log.wtf(TAG,"Thread identity changed from 0x"
                        + Long.toHexString(ident) + " to 0x"
                        + Long.toHexString(newIdent) + " while dispatching to "
                        + msg.target.getClass().getName() + ""
                        + msg.callback + " what="+ msg.what); } msg.recycleUnchecked(); }}}Copy the code

The quit/quitSafely from work

This is a rough behavior. MQ can’t work normally if he leaves Looper. That is to say, quitting work means quitting

class Looper {
    public void quit(a) {
        mQueue.quit(false);
    }
    
    public void quitSafely(a) {
        mQueue.quit(true); }}Copy the code

Message Handler

It’s a little bit clearer here. Apis are basically divided into the following categories:

User oriented:

Create Message by MessageThe flyweight pattern
Send a message, and note that postRunnable is also a message
Remove the message,
Exit etc.

Message oriented processing:

class Handler {
    /** * Subclasses must implement this to receive messages. */
    public void handleMessage(@NonNull Message msg) {}/** * Handle system messages here. * Looper
    public void dispatchMessage(@NonNull Message msg) {
        if(msg.callback ! =null) {
            handleCallback(msg);
        } else {
            if(mCallback ! =null) {
                if (mCallback.handleMessage(msg)) {
                    return; } } handleMessage(msg); }}}Copy the code

If the handleMessage is not overwritten, the message is dropped.

The message sending part can be combed with the following figure:

Summary: At this point, we have a complete understanding of the messaging mechanism of the Framework layer. Previously we combed:

Both the Native layer and the Java layer have message queues, and there is a corresponding relationship through JNI and pointer mapping

Native layer and Java layer MQThe general process of message retrieval

How does Java layer Looper work

Java Handler overview

From what we have already discussed, we can summarize: From Java Runtime:

The message queuing mechanism servesThread levelThat is, a thread can have a working message queue or not.

That is, a Thread has at most one working Looper.

Looper and the Java layer MQOne to one correspondence

Handler is the entry point to MQ, as wellThe messageThe handler

Message –MessageThe application ofThe flyweight patternThe information of oneself is sufficientSelf consistent, the cost of creating messages is large, so the enjoy element pattern is used to reuse message objects.

Now let’s continue to explore the details and solve the confusion left by the previous ambiguities:

Type and nature of the message
The pollInner of a Native layer Looper

Type and nature of the message

Several important member variables in message:

class Message {
   
    public int what;
    
    public int arg1;
    
    public int arg2;
    
    public Object obj;

    public Messenger replyTo;

    /*package*/ int flags;
    
    public long when;

    /*package*/ Bundle data;

    /*package*/ Handler target;

    /*package*/ Runnable callback;

}
Copy the code

Where the target is the target, if there is no target, it is a special message: the synchronization barrier is a barrier;

What is the message identifier Arg1 and Arg2 are inexpensive data that can be placed in the Bundle data if insufficient to represent information.

ReplyTo and obj are used when sending messages across processes, so leave it at that.

Flags indicates the status of the message, such as whether it is in use or whether it is a synchronous message

The synchronization barrier mentioned above, or barrier, prevents subsequent synchronization messages from being retrieved, as you saw earlier in the Next method of MQ in the Java layer.

We also remember that in the next method, an infinite loop is used to try to read a message that satisfies the processing condition. If the message fails to be read, the caller (Looper) will be blocked all the time because of the existence of the infinite loop.

At this point, a conclusion can be verified that messages can be divided into three types according to functional classification:

Ordinary message
Sync barrier message
Asynchronous messaging

The synchronization message is an internal mechanism. After the barrier is set, remove the barrier at a proper time. Otherwise, common messages will never be processed. To cancel the barrier, use the token returned when the barrier is set.

Native layer which

Looper at the Native layer is interesting to see what it does at the Native layer.

Those interested in the full source code can see it here and read it in the excerpts below.

As mentioned earlier in Looper’s pollOnce, pollInner is called to retrieve the message after processing the dormant Response

int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d".this, timeoutMillis);
#endif

    // Adjust the timeout based on when the next message is due.
    if(timeoutMillis ! =0&& mNextMessageUptime ! = LLONG_MAX) {nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
        if (messageTimeoutMillis >= 0
                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
            timeoutMillis = messageTimeoutMillis;
        }
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - next message in %lldns, adjusted timeout: timeoutMillis=%d".this, mNextMessageUptime - now, timeoutMillis);
#endif
    }

    // Poll.
    int result = ALOOPER_POLL_WAKE;
    mResponses.clear(a); mResponseIndex =0;

    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    
    / / note 1
    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);

    // Acquire lock.
    mLock.lock(a);/ / note 2
    // Check for poll error.
    if (eventCount < 0) {
        if (errno == EINTR) {
            goto Done;
        }
        ALOGW("Poll failed with an unexpected error, errno=%d", errno);
        result = ALOOPER_POLL_ERROR;
        goto Done;
    }

/ / note 3
    // Check for poll timeout.
    if (eventCount == 0) {
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - timeout".this);
#endif
        result = ALOOPER_POLL_TIMEOUT;
        goto Done;
    }

/ / note 4
    // Handle all events.
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - handling events from %d fds".this, eventCount);
#endif

    for (int i = 0; i < eventCount; i++) {
        int fd = eventItems[i].data.fd;
        uint32_t epollEvents = eventItems[i].events;
        if (fd == mWakeReadPipeFd) {
            if (epollEvents & EPOLLIN) {
                awoken(a); }else {
                ALOGW("Ignoring unexpected epoll events 0x%x on wake read pipe.", epollEvents); }}else {
            ssize_t requestIndex = mRequests.indexOfKey(fd);
            if (requestIndex >= 0) {
                int events = 0;
                if (epollEvents & EPOLLIN) events |= ALOOPER_EVENT_INPUT;
                if (epollEvents & EPOLLOUT) events |= ALOOPER_EVENT_OUTPUT;
                if (epollEvents & EPOLLERR) events |= ALOOPER_EVENT_ERROR;
                if (epollEvents & EPOLLHUP) events |= ALOOPER_EVENT_HANGUP;
                pushResponse(events, mRequests.valueAt(requestIndex));
            } else {
                ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
                        "no longer registered.", epollEvents, fd);
            }
        }
    }
Done: ;

/ / note 5
    // Invoke pending message callbacks.
    mNextMessageUptime = LLONG_MAX;
    while (mMessageEnvelopes.size() != 0) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
        if (messageEnvelope.uptime <= now) {
            // Remove the envelope from the list.
            // We keep a strong reference to the handler until the call to handleMessage
            // finishes. Then we drop it so that the handler can be deleted *before*
            // we reacquire our lock.
            { // obtain handler
                sp<MessageHandler> handler = messageEnvelope.handler;
                Message message = messageEnvelope.message;
                mMessageEnvelopes.removeAt(0);
                mSendingMessage = true;
                mLock.unlock(a);#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
                ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d".this, handler.get(), message.what);
#endif
                handler->handleMessage(message);
            } // release handler

            mLock.lock(a); mSendingMessage =false;
            result = ALOOPER_POLL_CALLBACK;
        } else {
            // The last message left at the head of the queue determines the next wakeup time.
            mNextMessageUptime = messageEnvelope.uptime;
            break; }}// Release lock.
    mLock.unlock(a);/ / note 6
    // Invoke all response callbacks.
    for (size_t i = 0; i < mResponses.size(a); i++) { Response& response = mResponses.editItemAt(i);
        if (response.request.ident == ALOOPER_POLL_CALLBACK) {
            int fd = response.request.fd;
            int events = response.events;
            void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p".this, response.request.callback.get(), fd, events, data);
#endif
            int callbackResult = response.request.callback->handleEvent(fd, events, data);
            if (callbackResult == 0) {
                removeFd(fd);
            }
            // Clear the callback reference in the response structure promptly because we
            // will not clear the response vector itself until the next poll.
            response.request.callback.clear();
            result = ALOOPER_POLL_CALLBACK;
        }
    }
    return result;
}
Copy the code

It has a note on it

1 Epoll mechanism, waitmEpollFdGenerates an event, and the wait has a timeout.
2,3,4 are the three outcomes of waiting,gotoStatement can jump directly totag 处
2 test pollWhether the errorIf so, jump to Done
3 test poolIf the timeoutIf so, jump to Done
4 Process all events after epoll
5 Handle the callback of pending messages
6 Handles all Response callbacks

And we can see the following results are returned:

ALOOPER_POLL_CALLBACK

A Response with a pending message or request.ident value of ALOOPER_POLL_CALLBACK was processed. If not:

ALOOPER_POLL_WAKE Wakes up normally
ALOOPER_POLL_ERROR epoll error
ALOOPER_POLL_TIMEOUT epoll timeout

Look for enumeration values:

ALOOPER_POLL_WAKE = -1,
ALOOPER_POLL_CALLBACK = -2,
ALOOPER_POLL_TIMEOUT = -3,
ALOOPER_POLL_ERROR = -4
Copy the code

In the stage summary, we conducted a brain burst on the news and Native layer’s pollInner, leading to the epoll mechanism.

Looper distribution on the Native layer has a lot to do with it, but we can’t wait to do it with epoll.

## Brain Burst: THE I/O model in Linux

Using Libevent and LibeV to Improve network application Performance — A history of I/O Model evolution by Hguisu

PS: Some of the images in this section are directly quoted from the article. I took the liberty of not looking for the original content and citing it

Blocking I/O model diagram: The process that occurs in the kernel to wait for and copy data when recv() is called

The implementation is very simple, but there is a problem: the blocking prevents the thread from performing any other computation. In the context of network programming, you need to use multiple threads to improve the ability to handle concurrency.

Note, don’t use AndroidThe hardware is triggered by clicking on the screenTo correspond to thisNetwork concurrentThese are two different things.

If multiple processes or multiple threads are used to realize concurrent responses, the model is as follows:

So far, we’ve been looking at the I/O blocking model.

Brain-burst, blocking is the process of calling a method and waiting for the return value, as if the content executing within the thread is stuck here.

If you want to eliminate this lag, instead of calling methods to wait for the I/O result, return immediately!

Here’s an example:

You go to a suit store, you get your suit made, you get your size, you sit in the store and you wait until it’s ready and you get it to you, that’s blocking, that can kill you;
When you go to a suit store and have your suit made, you are told not to wait for days until you have time to check it out. This is non-blocking.

After changing to non-blocking model, the response model is as follows:

Understandably, this approach requires customers to poll. It’s not customer friendly, but it doesn’t hurt the store at all, and it makes the waiting area less crowded.

Some suit shops have reformed to be more customer-friendly:

Go to the suit shop to customize the suit, determine the style and size, leave the contact information, such as the suit is done to contact the customer, let him to pick up.

This becomes the select or poll model:

Note: The reformed suit shop needs to add an employee, the user thread identified in the figure, whose job is:

Record customer orders and contact information at the front desk
Take a record ofThe orderTo find the production room,Constantly checkIf the order is completed, the completed one can be picked up and contacted with the customer.

Also, when he went to see the order completed, he couldn’t record the customer information at the front desk, which meant he was blocked and other work had to be put on hold.

This approach, for production purposes, is not so different from the non-blocking model. We added a salesperson, but we solved it with a salesperson who used to go to the production room and say, “Is the order ready?” The problem.

It is worth mentioning that in order to improve the quality of service, this employee needs to record some information every time he goes to the production room to ask for an order:

Whether the order completion is answered when asked;

Whether the answer lies; Etc.

Some stores have a record book for each of the different assessment items, similar to the SELECT model

Some stores use only one record book, but the book can use forms to record various items, similar to the poll model

Select model and poll model have a high degree of approximation.

Before long, the boss found that the clerk’s work efficiency was a little low. He always took an order book and went to ask the order again. It was not that the employee was not diligent, but that there was something wrong with the mode.

So the boss made another reform:

inThe front desk 和 Made betweenAdd a message pipeline between.
When the production room has progress to report, it sends a letter to the front desk with the order number on it.
The front desk employee goes directly to the corresponding order.

So this becomes the Epoll model and solves the traversal efficiency problem of the select/poll model.

As a result, the front desk staff no longer need to follow the order book from top to bottom. Improved efficiency, as long as nothing happens, the front desk staff can gracefully paddle.

Let’s look at the constructor of NativeLooper:

Looper::Looper(bool allowNonCallbacks) :
        mAllowNonCallbacks(allowNonCallbacks), mSendingMessage(false),
        mResponseIndex(0), mNextMessageUptime(LLONG_MAX) {
    int wakeFds[2];
    int result = pipe(wakeFds);
    LOG_ALWAYS_FATAL_IF(result ! =0."Could not create wake pipe. errno=%d", errno);

    mWakeReadPipeFd = wakeFds[0];
    mWakeWritePipeFd = wakeFds[1];

    result = fcntl(mWakeReadPipeFd, F_SETFL, O_NONBLOCK);
    LOG_ALWAYS_FATAL_IF(result ! =0."Could not make wake read pipe non-blocking. errno=%d",
            errno);

    result = fcntl(mWakeWritePipeFd, F_SETFL, O_NONBLOCK);
    LOG_ALWAYS_FATAL_IF(result ! =0."Could not make wake write pipe non-blocking. errno=%d",
            errno);

    // Allocate the epoll instance and register the wake pipe.
    mEpollFd = epoll_create(EPOLL_SIZE_HINT);
    LOG_ALWAYS_FATAL_IF(mEpollFd < 0."Could not create epoll instance. errno=%d", errno);

    struct epoll_event eventItem;
    memset(& eventItem, 0.sizeof(epoll_event)); // zero out unused members of data field union
    eventItem.events = EPOLLIN;
    eventItem.data.fd = mWakeReadPipeFd;
    result = epoll_ctl(mEpollFd, EPOLL_CTL_ADD, mWakeReadPipeFd, & eventItem);
    LOG_ALWAYS_FATAL_IF(result ! =0."Could not add wake read pipe to epoll instance. errno=%d",
            errno);
}
Copy the code

conclusion

I believe that here, you have their own understanding of all kinds of problems. In accordance with the convention, or to summarize, because this is a brain storm, so the mind is relatively jumping, content before and after the relationship is not obvious.

Let’s combine a question to point out the context.

What about Java layer Looper and MQ that uses an infinite loop but does not “block” the UI thread/does not cause ANR/and can still respond to click events

Android is based onevent-drivenAnd built upperfectMessage mechanism
The Message mechanism of the Java layer is only a part of it, which is responsible for message queue-oriented processingMessage queue management.Message delivery.Message processing
Looper’s infinite cycle is guaranteedThe message queue 的 Message deliveryAlways in active operation, without a loop, the distribution stops.
The MessageQueueInfinite loopTo ensure theLooper can get valid messages, guaranteeing LooperIt runs as long as there's a messageWhen a valid message is found, the loop is broken.
And the Java layer MessageQueue calls the Native layer MQ through JNI in an infinite loop in the next() methodpollOnce, drives the Native layer to process Native layer messages
It’s worth mentioning that everything the UI thread handles is also message-based, whether it’s updating the UI, responding to click events, etc.

Therefore, it is the infinite loop after Looper performs loop() that ensures the normal execution of various tasks of the UI thread.

Then there’s ANR, which is Android’s way of checking that the main thread messaging mechanism is working properly and healthily.

The main thread Looper needs to utilize message mechanism to drive UI rendering and interactive event processing. If the execution of a message or the business derived from it occupies a lot of time on the main thread, the main thread will be blocked for a long time and the user experience will be affected.

Therefore, ANR detection adopts a mechanism of planting time bombs, which must rely on the efficient operation of Looper to eliminate the time bombs previously installed. But this time bomb is more interesting, it will be discovered to detonate.

When it comes to responding to click events, similar events always start from the hardware, go to the kernel, and then to the user space through inter-process communication. These events exist in the form of messages in the Native layer. After processing, they show:

The ViewRootImpl receives the input from the InputManager and handles the event

Here we borrow a diagram to summarize the entire message mechanism flow:

Image from Android7.0 MessageQueue in detail by Gaugamela

PS: This article is very long, long content, time-consuming, about 10 days to write, there are still a lot of content not to enjoy. For example: “In what cases does the Java layer use JNI to call Native layer awakenings, and why?” And so on.

But given the space, I decided not to dig any further.