Android WatchDog原理分析

简述

了解WatchDog的原理,可以更好的理解系统服务的运行机制

分析

1.Watchdog extends Thread

Watchdog是一个线程

2.在SystemServer.java中启动

private void startOtherServices() {
    ······
    traceBeginAndSlog("InitWatchdog");
    final Watchdog watchdog = Watchdog.getInstance();
    watchdog.init(context, mActivityManagerService);
    traceEnd();
    ······
    traceBeginAndSlog("StartWatchdog");
    Watchdog.getInstance().start();
   traceEnd();
}
因为是线程,所以,只要start即可

3.查看WatchDog的构造方法

private Watchdog() {
        super("watchdog");
        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.

        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));

        // Initialize monitor for Binder threads.
        addMonitor(new BinderThreadMonitor());

        mOpenFdMonitor = OpenFdMonitor.create();

        // See the notes on DEFAULT_TIMEOUT.
        assert DB ||
                DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;

        // mtk enhance
        exceptionHWT = new ExceptionLog();
    }
1.重点关注两个对象:mMonitorChecker和mHandlerCheckers

2.mHandlerCheckers列表元素的来源:
1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入
2)外部导入:Watchdog.getInstance().addThread(handler);

3.mMonitorChecker列表元素的来源:
外部导入:Watchdog.getInstance().addMonitor(monitor);
特别说明:addMonitor(new BinderThreadMonitor());

4.查看WatchDog的run方法

public void run() {
        boolean waitedHalf = false;
        boolean mSFHang = false;
        while (true) {
            ······
            synchronized (this) {
                ······
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();
                }
                ······
            }
            ······
}
对mHandlerCheckers列表元素进行检测

5.查看HandlerChecker的scheduleCheckLocked

public void scheduleCheckLocked() {
        if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
        }

        if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
        }
        
        mCompleted = false;
        mCurrentMonitor = null;
        mStartTime = SystemClock.uptimeMillis();
        mHandler.postAtFrontOfQueue(this);
}

1.mMonitors.size() == 0的情況,
主要为了检查mHandlerCheckers中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling()

2.mMonitorChecker对象的列表元素一定是大于0,此时,关注点在mHandler.postAtFrontOfQueue(this):
public void run() {
       final int size = mMonitors.size();
       for (int i = 0 ; i < size ; i++) {
            synchronized (Watchdog.this) {
                mCurrentMonitor = mMonitors.get(i);
            }
            mCurrentMonitor.monitor();
       }

       synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
       }
}
运用的手段:监听monitor方法
1)这里是对mMonitors进行monitor,而能够满足条件的只有:mMonitorChecker,例如:各种服务通过addMonitor加入列表
ActivityManagerService.java
    Watchdog.getInstance().addMonitor(this); 

InputManagerService.java
    Watchdog.getInstance().addMonitor(this); 

PowerManagerService.java
    Watchdog.getInstance().addMonitor(this); 

ActivityManagerService.java
    Watchdog.getInstance().addMonitor(this); 

WindowManagerService.java
    Watchdog.getInstance().addMonitor(this); 
而被执行的monitor方法很简单,例如ActivityManagerService:
public void monitor() {
     synchronized (this) { }
}
这里仅仅是检查系统服务是否被锁住。

2)特别说明,怎样检查BinderThreadMonitor?
Watchdog的内部类
private static final class BinderThreadMonitor implements Watchdog.Monitor {
        @Override
        public void monitor() {
            Binder.blockUntilThreadAvailable();
        }
}

android.os.Binder.java
public static final native void blockUntilThreadAvailable();

android_util_Binder.cpp
static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
{
    return IPCThreadState::self()->blockUntilThreadAvailable();
}

IPCThreadState.cpp
void IPCThreadState::blockUntilThreadAvailable()
{
    pthread_mutex_lock(&mProcess->mThreadCountLock);
    while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
        ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
                static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
                static_cast<unsigned long>(mProcess->mMaxThreads));
        pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
    }
    pthread_mutex_unlock(&mProcess->mThreadCountLock);
}
这里仅仅是检查进程中包含的可执行线程的数量不能超过mMaxThreads,如果超过了最大值(31个),就需要等待。
原因:
ProcessState.cpp
#define DEFAULT_MAX_BINDER_THREADS 15
但是systemserver.java进行了设置
// maximum number of binder threads used for system_server
// will be higher than the system default
private static final int sMaxBinderThreads = 31;
private void run() {
    ······
    BinderInternal.setMaxThreads(sMaxBinderThreads);
    ······
}

6.发生超时后,WatchDog会做什么?

public void run() {
    ······
    Process.killProcess(Process.myPid());
    System.exit(10);
    ······
}
kill自己所在进程(system_server),并退出。

7.问题

1).WatchDog会打印什么日志?

(1)process stack traces

保存路径由dalvik.vm.stack-trace-file或dalvik.vm.stack-trace-dir控制,常规为/data/anr/ ActivityManagerService.dumpStackTraces(true, pids, null, null, getInterestingNativePids()); 
注意点: 1.堵塞一半时即WAITED_HALF,也会打印process stack traces

(2)slog

sys log ---> android.util.Slog (hide类) 

Slog.e(TAG, "**SWT happen **" + subject); 

Slog.v(TAG, "** save all info before killnig system server **"); 

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); 

Slog.w(TAG, "*** GOODBYE!");

(3)event log

EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);

(4)kernel stack traces

保存路径由dalvik.vm.stack-trace-file控制,常规为/data/anr/
if (RECORD_KERNEL_THREADS) {
   dumpKernelStackTraces();
}
private File dumpKernelStackTraces() {
        String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
        if (tracesPath == null || tracesPath.length() == 0) {
            return null;
        }

        native_dumpKernelStacks(tracesPath);
        return new File(tracesPath);
}

(5)dropbox

Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
     public void run() {
            Slog.v(TAG, "** start addErrorToDropBox **");
            mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name.isEmpty() ? subject : name, null, stack, null);
            }
};
dropboxThread.start();
注意:
dropbox一般放在/data/system/dropbox目录下,具体原因如下:
DropBoxManagerService.java
public DropBoxManagerService(final Context context) {
        this(context, new File("/data/system/dropbox"), FgThread.get().getLooper());
}

2.为什么要监测UiThread、IoThread、DisplatyThread、FgThread?

首先,这4个类,继承ServiceThread,是单例模式。例如UiThread.java

/**
 * Shared singleton thread for showing UI.  This is a foreground thread, and in
 * additional should not have operations that can take more than a few ms scheduled
 * on it to avoid UI jank.
 */
public final class UiThread extends ServiceThread {
    private static final long SLOW_DISPATCH_THRESHOLD_MS = 100;
    private static UiThread sInstance;
    private static Handler sHandler;

    private UiThread() {
        super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
    }

    @Override
    public void run() {
        // Make sure UiThread is in the fg stune boost group
        Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP);
        super.run();
    }

    private static void ensureThreadLocked() {
        if (sInstance == null) {
            sInstance = new UiThread();
            sInstance.start();
            final Looper looper = sInstance.getLooper();
            looper.setTraceTag(Trace.TRACE_TAG_ACTIVITY_MANAGER);
            looper.setSlowDispatchThresholdMs(SLOW_DISPATCH_THRESHOLD_MS);
            sHandler = new Handler(sInstance.getLooper());
        }
    }

    public static UiThread get() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sInstance;
        }
    }

    public static Handler getHandler() {
        synchronized (UiThread.class) {
            ensureThreadLocked();
            return sHandler;
        }
    }
}
1.通过get()获取对象
2.通过getHandler()获取各自线程里面的Handler对象
3.注意看,创建自身对象ensureThreadLocked的时候,就进行了start动作。也就是说,这个线程
在创建对象的时候就,就已经启动了。

其次,这四个类都继承ServiceThread ,而ServiceThread继承HandlerThread。我们重点关注线程中的Handler,因为ActivityManagerService、WMS、PMS等系统服务都涉及调用它们。

final class UiHandler extends Handler {
        public UiHandler() {
            super(com.android.server.UiThread.get().getLooper(), null, true);
        }

        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
            case SHOW_ERROR_UI_MSG: {
                mAppErrors.handleShowAppErrorUi(msg);
                ensureBootCompleted();
            } break;
            ······
        }
}
1.UiHandler是直接获取的UiThread里面的Looper。我们清楚一个线程一个Looper,一个MessageQueue,但是可以有多个Handler.
2.我们看handleMessage里面的处理方式,说明并不一定是主线程才能更新Ui。

最后,UIThread、IoThread、DisplatyThread、FgThread之间有什么区别?

a.线程名称不一样:
分别对应名称为android.ui、android.io、android.display、android.fg

b.线程等级有差异
UiThread-->Process.THREAD_PRIORITY_FOREGROUND
IoThread、FgThread-->android.os.Process.THREAD_PRIORITY_DEFAULT
DisplatyThread-->Process.THREAD_PRIORITY_DISPLAY + 1

c.使用的场景略有差异
UiThread --> ActivityManagerService
DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService
IoThread -->
 PackageInstallerService、StorageManagerService、BluetoothManagerService

8.总结

1.Watchdog的核心对象为mHandlerCheckers和mMonitorChecker。

mHandlerCheckers:监控消息队列是否发生阻塞

mMonitorChecker:监控系统核心服务是否发生长时间持锁。

2.mHandlerCheckers的对象采用手段为通过mHandler.getLooper().getQueue().isPolling()判断是否超时;mMonitorChecker通过synchronized(this)判断是否超时,其中特别注意,BinderThreadMonitor主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。

3.超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析

4. 超时之后,Watchdog会杀掉自己的进程,也就是此时system_server进程id会变化

5.拓展:是否我们可以采用此方式来监听我们app是否也发生相关问题?

9.参考学习

https://blog.csdn.net/xiaosayidao/article/details/75453195

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,240评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,328评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,182评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,121评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,135评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,093评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,013评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,854评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,295评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,513评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,678评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,398评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,989评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,636评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,801评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,657评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,558评论 2 352

推荐阅读更多精彩内容