Android-Looper.loop()为什么不会死循环阻塞主线程？

其实这里的原因，主要是因为MessageQueue底层采用了epoll进行阻塞，当接收到消息的时候会唤醒主线程。我们这里主要从MessageQueue的入队还有next()方法进行分析。

MessageQueue的构造器如下

    MessageQueue(boolean quitAllowed) {
        mQuitAllowed = quitAllowed;
        mPtr = nativeInit();
    }

可以看到这里通过调用nativeInit()对mPtr做了初始化，而nativeInit()的实现如下：

static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
    NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue();
    if (!nativeMessageQueue) {
        jniThrowRuntimeException(env, "Unable to allocate native queue");
        return 0;
    }

    nativeMessageQueue->incStrong(env);
    return reinterpret_cast<jlong>(nativeMessageQueue);
}

可以看出，最终的返回值，其实是通过reinterpret_cast做类型的强制转换

NativeMessageQueue::NativeMessageQueue() :
        mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    // 代表消息循环的Looper也在Native层中呈现身影了。根据消息驱动的知识，一个线程会有一个
      //Looper来循环处理消息队列中的消息。下面一行的调用就是取得保存在线程本地存储空间（Thread Local Storage）中的Looper对象
    mLooper = Looper::getForThread();
    // 如为第一次进来，则该线程没有设置本地存储，所以须先创建一个Looper，
    //然后再将其保存到TLS中，这是很常见的一种以线程为单位的单例模式
    if (mLooper == NULL) {
        mLooper = new Looper(false);
        Looper::setForThread(mLooper);
    }
}

其实nativeInit方法最终返回的就是native代码中MessageQueue的指针

MessageQueue内部消息入队和唤醒机制

而MessageQueue的enqueueMessage，就是将消息入队的操作，MessageQueue是单向链表结构，是采用先入先出的操作来处理消息

    boolean enqueueMessage(Message msg, long when) {
// 判断消息是否为空
        if (msg.target == null) {
            throw new IllegalArgumentException("Message must have a target.");
        }
// 判断消息是否正在使用
        if (msg.isInUse()) {
            throw new IllegalStateException(msg + " This message is already in use.");
        }

// 采用同步方法块的方式，实现线程同步，保证一个队列一次只处理一个消息的入队
        synchronized (this) {
// 判断Looper是否有退出，这是在Looper.quit()方法中调用mQueue.quit(false);设置mQuitting为true的
            if (mQuitting) {
                IllegalStateException e = new IllegalStateException(
                        msg.target + " sending message to a Handler on a dead thread");
                Log.w(TAG, e.getMessage(), e);
// 回收消息，但是如果消息正在使用，则会抛异常，不会回收
                msg.recycle();
                return false;
            }
// 设置消息为正在使用
            msg.markInUse();
// 获取当前时间
            msg.when = when;
            Message p = mMessages;
            boolean needWake;
// p相当于当前Message的head
            if (p == null || when == 0 || when < p.when) {
                // New head, wake up the event queue if blocked.
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // Inserted within the middle of the queue.  Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
                Message prev;
// 采用无限for循环寻找插入点，直到找到为null的时候，因为这个时候p为当前节点，而prev为前一个节点，找到为空的当前节点，然后在这个位置插入
                for (;;) {
                    prev = p;
                    p = p.next;
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                // 设置需要插入的Message的下一个节点为null
                // 设置前一个节点的下一个节点为Message
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr != 0 because mQuitting is false.
            // 这里是判断线程是否需要被唤醒
            if (needWake) {
                nativeWake(mPtr);
            }
        }
        return true;
    }

这里的nativeWake，其实是调用了frameworks/base/core/jni/android_os_MessageQueue.cpp中的

static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->wake();
}

而这里的NativeMessageQueue的wake()方法，其实就是在android_os_MessageQueue.cpp中实现的

void NativeMessageQueue::wake() {
    mLooper->wake();
}

这里调用的是/system/core/libutils/Looper.cpp中的wake方法

void Looper::wake() {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ wake", this);
#endif

    uint64_t inc = 1;
    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t)));
    if (nWrite != sizeof(uint64_t)) {
        if (errno != EAGAIN) {
            LOG_ALWAYS_FATAL("Could not write wake signal to fd %d: %s",
                    mWakeEventFd, strerror(errno));
        }
    }
}

这里是调用了系统的write()方法，写入唤醒事件，通过I/O流写入，然后通过pipe（管道）的方式实现跨进程唤醒。

在frameworks/base/core/jni/android_os_MessageQueue.cpp的NativeMessageQueue方法中，会在mLooper为null的时候，初始化，这里的初始化是通过system/core/libutils/Looper.cpp进行的。

Looper::Looper(bool allowNonCallbacks) :
        mAllowNonCallbacks(allowNonCallbacks), mSendingMessage(false),
        mPolling(false), mEpollFd(-1), mEpollRebuildRequired(false),
        mNextRequestSeq(0), mResponseIndex(0), mNextMessageUptime(LLONG_MAX) {
    // 初始化一个唤醒事件
    // 调用eventfd接口返回一个文件描述符，专门用于事件通知
    mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
    LOG_ALWAYS_FATAL_IF(mWakeEventFd < 0, "Could not make wake event fd: %s",
                        strerror(errno));

    AutoMutex _l(mLock);
    rebuildEpollLocked();
}

在Looper初始化时，会最终调用rebuildEpollLocked()

void Looper::rebuildEpollLocked() {
    // Close old epoll instance if we have one.
    if (mEpollFd >= 0) {
#if DEBUG_CALLBACKS
        ALOGD("%p ~ rebuildEpollLocked - rebuilding epoll set", this);
#endif
        close(mEpollFd);
    }

    // Allocate the new epoll instance and register the wake pipe.
    // 在这里，会分配一个新的epoll实例，并且注册唤醒管道
    // 这里的mEpollFd其实就是eventpoll的句柄
    mEpollFd = epoll_create(EPOLL_SIZE_HINT);
    LOG_ALWAYS_FATAL_IF(mEpollFd < 0, "Could not create epoll instance: %s", strerror(errno));

    struct epoll_event eventItem;
    memset(& eventItem, 0, sizeof(epoll_event)); // zero out unused members of data field union
    eventItem.events = EPOLLIN;
    eventItem.data.fd = mWakeEventFd;
    // 这里是首次调用epoll_etl，会拷贝fd
    // 这里传入的第四参数event的events的值是EPOLLIN，表示有可以读的操作
    // 第三个参数表示被监听的描述符，即wakeEvent文件描述符
    // 这里的添加操作其实就是epoll添加mWakeEventFd文件描述符为要监听的文件描述符
    int result = epoll_ctl(mEpollFd, EPOLL_CTL_ADD, mWakeEventFd, & eventItem);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not add wake event fd to epoll instance: %s",
                        strerror(errno));

    for (size_t i = 0; i < mRequests.size(); i++) {
        const Request& request = mRequests.valueAt(i);
        struct epoll_event eventItem;
        // 获取管道中的事件item
        request.initEventItem(&eventItem);

        // 将管道事件的item，添加到epoll中，并且开始监控管道事件，当管道中有事件写入的时候，读取管道事件，并且唤醒
        int epollResult = epoll_ctl(mEpollFd, EPOLL_CTL_ADD, request.fd, & eventItem);
        if (epollResult < 0) {
            ALOGE("Error adding epoll events for fd %d while rebuilding epoll set: %s",
                  request.fd, strerror(errno));
        }
    }
}

所以最终的唤醒，其实是通过监控管道中的I/O消息。而具体的唤醒，其实是在Message.next()中调用nativePollOnce等待和唤醒的。监控管道中的事情，是为了在唤醒时，知道管道中是否有文件描述符中有事件可以用来唤醒。

MessageQueue消息处理和等待机制

    Message next() {
        // Return here if the message loop has already quit and been disposed.
        // This can happen if the application tries to restart a looper after quit
        // which is not supported.
        final long ptr = mPtr;
// 是否退出的判断
        if (ptr == 0) {
            return null;
        }

        int pendingIdleHandlerCount = -1; // -1 only during first iteration
        int nextPollTimeoutMillis = 0;
        // 无限for循环
        for (;;) {
            if (nextPollTimeoutMillis != 0) {
                // 因为下一条Message尚未到处理时间，则会将等待过程中需要处理的内容交给CPU
                Binder.flushPendingCommands();
            }
            // 这里会有一个等待，在这个等待中设置了一个超时时间，即postDelayed等方式发送的延迟处理的消息，其实是通过等待一定的时间再继续执行的方式来进行
            nativePollOnce(ptr, nextPollTimeoutMillis);

            synchronized (this) {
                // Try to retrieve the next message.  Return if found.
                final long now = SystemClock.uptimeMillis();
                Message prevMsg = null;
                Message msg = mMessages;
                if (msg != null && msg.target == null) {
// 如果当前的msg不为空，但是这个msg中的Handler为空，那么直接拿下一个消息，因为这个消息已经没有Handler来进行处理
                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while (msg != null && !msg.isAsynchronous());
                }
                if (msg != null) {
                    //判断当前时间是否小于下一条要处理的消息的时间
                    if (now < msg.when) {
                        // 下一条消息尚未就绪。 设置超时以在准备就绪时唤醒。
                        // Next message is not ready.  Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // Got a message.
                        mBlocked = false;
                        // 取出消息，如果前一个消息不为空，则将前一个消息的指向指到当前消息的下一个
                        if (prevMsg != null) {
                            prevMsg.next = msg.next;
                        } else {
                            // 如果前一个消息为空，则说明当前消息是第一个
                            mMessages = msg.next;
                        }
                        // 将当前消息的指向置为null
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        return msg;
                    }
                } else {
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }

                // Process the quit message now that all pending messages have been handled.
                if (mQuitting) {
                    dispose();
                    return null;
                }

                // If first time idle, then get the number of idlers to run.
                // Idle handles only run if the queue is empty or if the first message
                // in the queue (possibly a barrier) is due to be handled in the future.
                if (pendingIdleHandlerCount < 0
                        && (mMessages == null || now < mMessages.when)) {
                    pendingIdleHandlerCount = mIdleHandlers.size();
                }
                if (pendingIdleHandlerCount <= 0) {
                    // No idle handlers to run.  Loop and wait some more.
                    mBlocked = true;
                    continue;
                }

                if (mPendingIdleHandlers == null) {
                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
                }
                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
            }

            // Run the idle handlers.
            // We only ever reach this code block during the first iteration.
            for (int i = 0; i < pendingIdleHandlerCount; i++) {
                final IdleHandler idler = mPendingIdleHandlers[i];
                mPendingIdleHandlers[i] = null; // release the reference to the handler

                boolean keep = false;
                try {
                    keep = idler.queueIdle();
                } catch (Throwable t) {
                    Log.wtf(TAG, "IdleHandler threw exception", t);
                }

                if (!keep) {
                    synchronized (this) {
                        mIdleHandlers.remove(idler);
                    }
                }
            }

            // Reset the idle handler count to 0 so we do not run them again.
            pendingIdleHandlerCount = 0;

            // While calling an idle handler, a new message could have been delivered
            // so go back and look again for a pending message without waiting.
            nextPollTimeoutMillis = 0;
        }
    }

这里，nativePollOnce在底层调用的是MessageQueue.cpp的android_os_MessageQueue_nativePollOnce函数，在这个函数内部是调用了MessageQueue.cpp的pollOnce函数，而pollOnce函数，其实是调用了Looper.cpp的pollOnce函数

Looper.cpp的pollOnce函数如下：

int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    int result = 0;
    for (;;) {
        while (mResponseIndex < mResponses.size()) {
            const Response& response = mResponses.itemAt(mResponseIndex++);
            int ident = response.request.ident;
            if (ident >= 0) {
                int fd = response.request.fd;
                int events = response.events;
                void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE
                ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
                        "fd=%d, events=0x%x, data=%p",
                        this, ident, fd, events, data);
#endif
                if (outFd != NULL) *outFd = fd;
                if (outEvents != NULL) *outEvents = events;
                if (outData != NULL) *outData = data;
                return ident;
            }
        }

        if (result != 0) {
#if DEBUG_POLL_AND_WAKE
            ALOGD("%p ~ pollOnce - returning result %d", this, result);
#endif
            if (outFd != NULL) *outFd = 0;
            if (outEvents != NULL) *outEvents = 0;
            if (outData != NULL) *outData = NULL;
            return result;
        }

        result = pollInner(timeoutMillis);
    }
}

在这个函数中不做过多的分析，其实这里最终调用了pollInner函数。

Looper.cpp的pollInner函数：

int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
#endif

    // Adjust the timeout based on when the next message is due.
    if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
        if (messageTimeoutMillis >= 0
                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
            timeoutMillis = messageTimeoutMillis;
        }
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
                this, mNextMessageUptime - now, timeoutMillis);
#endif
    }

    // Poll.
    int result = POLL_WAKE;
    mResponses.clear();
    mResponseIndex = 0;

    // We are about to idle.
    mPolling = true;

    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    // 第一点
    // 这里四个参数‘
    // 该方法其实就是mEpollFd监听mWakeEventFd所产生的对应事件
    // 第一个参数：表示epoll的句柄
    // 第二个参数：eventItems表示回传处理事件的数组
    // 第三个参数：表示每次能处理的最大事件数
    // 第四个参数：表示阻塞的时间，如果是-1，则表示一直阻塞，直到下一次来IO被唤醒
    // 在handler中，如果没有更多的数据了，则会传入-1，让其一直阻塞等待。
    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);

    // No longer idling.
    mPolling = false;

    // Acquire lock.
    mLock.lock();

    // Rebuild epoll set if needed.
    if (mEpollRebuildRequired) {
        mEpollRebuildRequired = false;
        rebuildEpollLocked();
        goto Done;
    }

    // Check for poll error.
    if (eventCount < 0) {
        if (errno == EINTR) {
            goto Done;
        }
        ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
        result = POLL_ERROR;
        goto Done;
    }

    // Check for poll timeout.
    if (eventCount == 0) {
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - timeout", this);
#endif
        result = POLL_TIMEOUT;
        goto Done;
    }

    // Handle all events.
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
#endif
    // 第二点
    // eventCount大于0，表示有eventCount个文件描述符有数据可读事件发生
    for (int i = 0; i < eventCount; i++) {
        int fd = eventItems[i].data.fd;
        uint32_t epollEvents = eventItems[i].events;
        // 若通过管道读端被唤醒
        if (fd == mWakeEventFd) {
            // 若为POLLIN事件，即为可读事件
            if (epollEvents & EPOLLIN) {
                // 去读取管道数据，执行到这里，线程相当于已经被唤醒
                awoken();
            } else {
                ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents);
            }
        } else {
            ssize_t requestIndex = mRequests.indexOfKey(fd);
            if (requestIndex >= 0) {
                int events = 0;
                if (epollEvents & EPOLLIN) events |= EVENT_INPUT;
                if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT;
                if (epollEvents & EPOLLERR) events |= EVENT_ERROR;
                if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP;
                pushResponse(events, mRequests.valueAt(requestIndex));
            } else {
                ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
                        "no longer registered.", epollEvents, fd);
            }
        }
    }
Done: ;

    // Invoke pending message callbacks.
    mNextMessageUptime = LLONG_MAX;
    while (mMessageEnvelopes.size() != 0) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
        if (messageEnvelope.uptime <= now) {
            // Remove the envelope from the list.
            // We keep a strong reference to the handler until the call to handleMessage
            // finishes.  Then we drop it so that the handler can be deleted *before*
            // we reacquire our lock.
            { // obtain handler
                sp<MessageHandler> handler = messageEnvelope.handler;
                Message message = messageEnvelope.message;
                mMessageEnvelopes.removeAt(0);
                mSendingMessage = true;
                mLock.unlock();

#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
                ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
                        this, handler.get(), message.what);
#endif
                handler->handleMessage(message);
            } // release handler

            mLock.lock();
            mSendingMessage = false;
            result = POLL_CALLBACK;
        } else {
            // The last message left at the head of the queue determines the next wakeup time.
            mNextMessageUptime = messageEnvelope.uptime;
            break;
        }
    }

    // Release lock.
    mLock.unlock();

    // Invoke all response callbacks.
    for (size_t i = 0; i < mResponses.size(); i++) {
        Response& response = mResponses.editItemAt(i);
        if (response.request.ident == POLL_CALLBACK) {
            int fd = response.request.fd;
            int events = response.events;
            void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
                    this, response.request.callback.get(), fd, events, data);
#endif
            // Invoke the callback.  Note that the file descriptor may be closed by
            // the callback (and potentially even reused) before the function returns so
            // we need to be a little careful when removing the file descriptor afterwards.
            int callbackResult = response.request.callback->handleEvent(fd, events, data);
            if (callbackResult == 0) {
                removeFd(fd, response.request.seq);
            }

            // Clear the callback reference in the response structure promptly because we
            // will not clear the response vector itself until the next poll.
            response.request.callback.clear();
            result = POLL_CALLBACK;
        }
    }
    return result;
}

这段代码有点长，其实我们只要看两点。第一点就是调用epoll_wait()函数的位置，这里是通过epoll机制，将线程先等待。
在看第二点，即进行同步锁定之后，执行唤醒的for循环，根据eventCount执行for循环，eventCount是在调用epoll_wait等待之后返回的一个管道数据的事件数量值，如果等于0，则不进行唤醒操作。如果大于0，则进行通过读取唤醒事件写入的I/O数据将线程唤醒。
在这个过程中，epoll_wait有两种情况会直接跳过唤醒过程，直接进入Done部分。

eventCount<0，即error，则直接跳过进入Done
eventCount=0，即poll等待超时，进入Done

而Done部分，主要是处理Native层中的消息，将消息发送给Handler的handleMessage来处理。
还有一部分就是处理所有的response的callback，即POLL_CALLBACK类型的response
response消息是在request中收集所有的reponse，然后在pollInner中的Done部分处理response

而Android主线程中，一直调用Looper.loop()却不会死循环阻塞的原因，其实就是通过采用epoll机制，由Looper监控管道中的消息，每当唤醒的时候，向管道中发送唤醒的文件描述符，而在loop()循环获取消息的时候，会优先调用epoll_wait等待，然后获取等待过程中管道中的文件描述符的数量，进而处理不同的情况，选择是否要唤醒主线程。

Looper死循环为什么不会导致应用卡死？

首先理解ANR是什么：
ANR：点击事件和Message没有及时的处理，比如点击事件，会记录一个响应时间，如果超过了5s，没处理完，则Handler就会发送一个ANR消息提醒。
点击事件5s没响应
广播10s没响应
service20s没响应
这些事件最终都是Message。比如点击事件，实在Choreographer封装，在doFrame函数的setVsync函数进行封装，在对应的doCallbacks进行消息封装回调。
ANR都是由Handler发送消息触发的，所以Looper死循环跟block没什么关系的，所以Looper的死循环不会导致ANR。
而Looper的死循环，其实就是线程没事做了，需要交出CPU，进行阻塞睡眠，当有消息来的时候，就会被唤醒。
所以Looper的死循环跟ANR并没有关系，风马牛不相及的两个点。ANR与Looper和Handler的关系就在于ANR是一个Message，也是由Handler发送的，而Looper就是轮询取出ANR消息进行处理。

https://www.jianshu.com/p/7bc2b86c4d89
https://www.cnblogs.com/renhui/p/12875396.html