最近遇到了蛮多framework挂掉引发的问题,这里做个总结分享.在看具体bug之前先简单了解下安卓系统的启动流程可以帮助我们定位和分析问题:
系统启动流程
开机的流程图如下:
大概的步骤为:
- 启动BootLoader: 开机引导可以初始化硬件设备、建立内存空间映射图等,然后拉起LinuxKerne
- 启动LinuxKernel: 设置缓存、加载驱动等,然后启动init进程
- init进程根据init.rc进行初始化: init.rc可以看做一个脚本,可以在里面修改文件权限、设置属性、拉起进程等,zygote、servicemanager、surfaceflinger这些系统进程就是它拉起来的
- 启动zygote: zygote启动的时候会孵化system_server进程
- 启动system_server: system_server会启动PMS、WMS、AMS等系统服务
- 启动AMS: AMS启动的时候会去启动一些ui相关的进程如SystemUi、Launcher等
系统奔溃重启流程
再来分析下当system_server挂掉的时候的重启流程:
- 由于java层的进程都是zygote fork出来的,它会监听子进程退出的信号,然后判断如果是system_server退出则kill掉自己
// https://cs.android.com/android/platform/superproject/+/master:frameworks/base/core/jni/com_android_internal_os_Zygote.cpp?q=com_android_internal_os_Zygote.cpp
// This signal handler is for zygote mode, since the zygote must reap its children
static jint com_android_internal_os_Zygote_nativeForkSystemServer(
JNIEnv* env, jclass, uid_t uid, gid_t gid, jintArray gids,
jint runtime_flags, jobjectArray rlimits, jlong permitted_capabilities,
jlong effective_capabilities) {
...
// 保存system_server的pid
gSystemServerPid = pid;
...
}
static void SigChldHandler(int /*signal_number*/, siginfo_t* info, void* /*ucontext*/) {
pid_t pid;
...
while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
...
// 如果system_server死掉了就把自己干掉
if (pid == gSystemServerPid) {
async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG,
"Exit zygote because system server (pid %d) has terminated", pid);
kill(getpid(), SIGKILL);
}
}
...
}
- zygote死掉之后init进程会重新把它拉起来
因为zygote的rc文件里面配置了在zygote重启的时候会重新启动audioserver、cameraserver等进程,所以他们也会重启
service zygote /system/bin/app_process64 -Xzygote /system/bin --zygote --start-system-server --socket-name=zygote
class main
priority -20
user root
group root readproc reserved_disk
socket zygote stream 660 root system
socket usap_pool_primary stream 660 root system
onrestart exec_background - system system -- /system/bin/vdc volume abort_fuse
onrestart write /sys/power/state on
onrestart restart audioserver
onrestart restart cameraserver
onrestart restart media
onrestart restart netd
onrestart restart wificond
task_profiles ProcessCapacityHigh MaxPerformance
然后zygote启动的时候又会重新启动system_server进程.接着就回到了正常开机的流程:PMS、WMS、AMS这些系统服务和SystemUi、Launcher被启动
系统开机之后死掉
死法一: 看门狗干掉
问题: 我们的某个应用打不开
直接原因: 从anr的trace定位到该应用启动的时候会调用嵌入式组提供的某个so库,调用里面的某个方法卡死造成anr。
本来到这里锅应该就转给底层去看了,但是底层说他也看不出具体原因,希望我们协助分析下.
一通搜索之后在发现anr里面有系统的trace文件:
// 这里是trace文件首行,意味着system_server出现卡死所以打印的堆栈
----- pid 4129 at 2021-11-04 19:08:03 -----
Cmd line: system_server
Build fingerprint: 'ViewSonic/IFP8650-5/IFP8650-5:11.0.0/20220907.130307/release-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes=21210 post zygote classes=3385
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
我还是第一次在anr目录下见到system_server卡死的堆栈,真是涨见识了。
既然看到堆栈文件了证明system_server挂过然后自动重启了,所以我们在日志文件里找下19:08:03附件的日志看看,能看到system_server由于StorageManagerService阻塞被看门狗干掉了:
...
11-04 19:08:10.436312 4129 4154 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.StorageManagerService on foreground thread (android.fg)
11-04 19:08:10.436985 4129 4154 W Watchdog: android.fg annotated stack trace:
11-04 19:08:10.437073 4129 4154 W Watchdog: at android.os.MessageQueue.nativePollOnce(Native Method)
11-04 19:08:10.437120 4129 4154 W Watchdog: at android.os.MessageQueue.next(MessageQueue.java:335)
11-04 19:08:10.437161 4129 4154 W Watchdog: at android.os.Looper.loop(Looper.java:183)
11-04 19:08:10.437202 4129 4154 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
11-04 19:08:10.437244 4129 4154 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
11-04 19:08:10.437272 4129 4154 W Watchdog: *** GOODBYE!
--------- switch to main
11-04 19:08:10.437306 4129 4154 I Process : Sending signal. PID: 4129 SIG: 9
...
11-04 19:08:11.591232 3900 3900 E Zygote : Exit zygote because system server (pid 4129) has terminated
然后它会自动重启:
然后framework重启:
11-04 19:08:11.990854 10300 10300 D AndroidRuntime: >>>>>> START com.android.internal.os.ZygoteInit uid 0 <<<<<<
那系统重启了为什么会导致so方法卡死呢?
从系统哥那了解到so内部实际是和某个由init.rc启动的服务进程做通讯,该进程需要调用system_server的某些方法。
system_server crash重启,并不会引发这个服务进程重启,也不会通知到这个服务,所以这个服务保存着之前挂掉的system_server的通讯链路,通讯失败然后就出现问题了。
死法二: 出现未捕获异常被干掉
过了几天在另外一个方案上又出现了同样卡死的问题,容易联想到应该也是system_server挂掉了。正常情况下system_server进程号是在1000以内的,用ps命令查看进程号发现它比较大,所以大概率的确是挂过了:
ps -A | grep system_server
其实我们可以直接通过"Exit zygote"关键字查找日志,石锤它是否真的挂过:
> grep -rn "Exit zygote"
./logd/logcat.099:12670:09-19 06:19:39.800634 301 301 E Zygote : Exit zygote because system server (pid 618) has terminated
从崩溃的时间点开始往上找618进程的日志可以看到他是因为创建IpClient失败崩溃:
--------- switch to crash
09-19 06:19:38.510179 618 700 E AndroidRuntime: *** FATAL EXCEPTION IN SYSTEM PROCESS: WifiHandlerThread
09-19 06:19:38.510179 618 700 E AndroidRuntime: java.lang.IllegalStateException: Could not create IpClient
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.networkstack.NetworkStackClientBase.lambda$makeIpClient$1(NetworkStackClientBase.java:74)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.networkstack.-$$Lambda$NetworkStackClientBase$vgsHk-RCpPUAYmE-7YTwKKaAuFA.accept(Unknown Source:6)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.networkstack.NetworkStackClientBase.requestConnector(NetworkStackClientBase.java:119)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.networkstack.NetworkStackClientBase.makeIpClient(NetworkStackClientBase.java:70)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.ip.IpClientUtil.makeIpClient(IpClientUtil.java:80)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.server.wifi.FrameworkFacade.makeIpClient(FrameworkFacade.java:202)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.server.wifi.ClientModeImpl.setupClientMode(ClientModeImpl.java:3606)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.server.wifi.ClientModeImpl.access$3600(ClientModeImpl.java:164)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.server.wifi.ClientModeImpl$ConnectModeState.enter(ClientModeImpl.java:3790)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.com.android.internal.util.StateMachine$SmHandler.invokeEnterMethods(StateMachine.java:1037)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.com.android.internal.util.StateMachine$SmHandler.performTransitions(StateMachine.java:879)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:819)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:106)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at android.os.Looper.loop(Looper.java:223)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at android.os.HandlerThread.run(HandlerThread.java:67)
09-19 06:19:38.510179 618 700 E AndroidRuntime: Caused by: android.os.DeadObjectException
09-19 06:19:38.510179 618 700 E AndroidRuntime: at android.os.BinderProxy.transactNative(Native Method)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at android.os.BinderProxy.transact(BinderProxy.java:550)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.INetworkStackConnector$Stub$Proxy.makeIpClient(INetworkStackConnector.java:226)
09-19 06:19:38.510179 618 700 E AndroidRuntime: at com.android.wifi.x.android.net.networkstack.NetworkStackClientBase.lambda$makeIpClient$1(NetworkStackClientBase.java:72)
09-19 06:19:38.510179 618 700 E AndroidRuntime: ... 14 more
--------- switch to events
09-19 06:19:38.510843 618 700 I am_crash: [618,0,system_server,-1,android.os.DeadObjectException,Could not create IpClient,BinderProxy.java,-2]
死法三: 系统关键服务奔溃导致系统重启
过了一个星期,又又出现了同样的卡死问题。但是这次直接过滤"Exit zygote"关键字找不到信息, 但是看system_server进程号是4677,大概率还是挂过
system 4677 4416 6 02:47:25 ? 00:05:48 system_server
然后再搜索zygote启动的关键字" START com.android.internal.os.ZygoteInit"发现的确系统在中间重启过:
./logd/logcat.028:3181:09-29 02:47:22.978333 4416 4416 D AndroidRuntime: >>>>>> START com.android.internal.os.ZygoteInit uid 0 <<<<<<
去这个时间往上找可以看到一些native的堆栈错误:
--------- switch to crash
09-29 02:47:21.762310 4403 4403 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
09-29 02:47:21.762564 4403 4403 F DEBUG : Build fingerprint: 'Droidlogic/t982_ar301/AVS-7500:11/RD2A.211001.002/eng.user5.20220811.094633:user/test-keys'
09-29 02:47:21.762637 4403 4403 F DEBUG : Revision: '0'
09-29 02:47:21.762742 4403 4403 F DEBUG : ABI: 'arm'
09-29 02:47:21.763405 4403 4403 F DEBUG : Timestamp: 2022-09-29 02:47:21-0400
09-29 02:47:21.763549 4403 4403 F DEBUG : pid: 343, tid: 3098, name: composer@2.4-se >>> /vendor/bin/hw/android.hardware.graphics.composer@2.4-service.droidlogic <<<
09-29 02:47:21.763584 4403 4403 F DEBUG : uid: 1000
09-29 02:47:21.763614 4403 4403 F DEBUG : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xf26ea000
09-29 02:47:21.763695 4403 4403 F DEBUG : Cause: [GWP-ASan]: Buffer Overflow, 0 bytes right of a 24-byte allocation at 0xf26e9fe8
09-29 02:47:21.763743 4403 4403 F DEBUG : r0 f26e9fe8 r1 0000000c r2 00000000 r3 00000000
09-29 02:47:21.763774 4403 4403 F DEBUG : r4 00000000 r5 0000001a r6 e87d9088 r7 f0444160
09-29 02:47:21.763803 4403 4403 F DEBUG : r8 e87d9080 r9 e87d8ec8 r10 f26e9fe8 r11 f0e3f760
09-29 02:47:21.763836 4403 4403 F DEBUG : ip f0e3db20 sp e87d8ea0 lr f0e18db3 pc f1d9ceca
09-29 02:47:21.776427 4403 4403 F DEBUG : backtrace:
09-29 02:47:21.776600 4403 4403 F DEBUG : #00 pc 00002eca /vendor/lib/libamgralloc_ext.so (am_gralloc_get_width(native_handle const*)+8) (BuildId: e6b2c270ca2b92162da0e931af324ab6)
09-29 02:47:21.776802 4403 4403 F DEBUG : #01 pc 00058daf /vendor/lib/hw/hwcomposer.amlogic.so (NnProcessor::asyncProcess(std::__1::shared_ptr<DrmFramebuffer>&, std::__1::shared_ptr<DrmFramebuffer>&, int&)+310) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.776907 4403 4403 F DEBUG : #02 pc 0004a82f /vendor/lib/hw/hwcomposer.amlogic.so (MultiplanesWithDiComposition::runProcessor(MultiplanesWithDiComposition::DisplayPair&, int&, int&)+210) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.776977 4403 4403 F DEBUG : #03 pc 0004d9e1 /vendor/lib/hw/hwcomposer.amlogic.so (MultiplanesWithDiComposition::commit(bool)+976) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.777048 4403 4403 F DEBUG : #04 pc 0003d1bb /vendor/lib/hw/hwcomposer.amlogic.so (Hwc2Display::presentVideo(int*)+58) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.777102 4403 4403 F DEBUG : #05 pc 0004370f /vendor/lib/hw/hwcomposer.amlogic.so (VideoTunnelThread::handleGameMode()+154) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.777154 4403 4403 F DEBUG : #06 pc 00043375 /vendor/lib/hw/hwcomposer.amlogic.so (VideoTunnelThread::gameModeThreadMain(void*)+56) (BuildId: 0ed81d2b4576c8a8b13c166d0020f67a)
09-29 02:47:21.777213 4403 4403 F DEBUG : #07 pc 000808b3 /apex/com.android.runtime/lib/bionic/libc.so (__pthread_start(void*)+40) (BuildId: 7bc8508bdbcc8163b9a5fbf3443efa72)
09-29 02:47:21.777261 4403 4403 F DEBUG : #08 pc 00039d23 /apex/com.android.runtime/lib/bionic/libc.so (__start_thread+30) (BuildId: 7bc8508bdbcc8163b9a5fbf3443efa72)
09-29 02:47:21.777295 4403 4403 F DEBUG : deallocated by thread 445:
这个堆栈我看们可以看出pid 343这个进程在libamgralloc_ext.so的am_gralloc_get_width方法里面出现了野指针
然后往下翻一点可以看到一堆的系统服务died:
09-29 02:47:22.551747 394 3624 E csound : [HIDLServer]:serviceDied, droid tvserver daemon a client died cookie:2
09-29 02:47:22.551817 394 3624 E csound : tvserver daemon client:2 died
09-29 02:47:22.551865 394 3624 E csound : handleServiceDeath, client size:5
09-29 02:47:22.551956 323 411 E SystemControl: systemcontrol daemon client died cookie:1
09-29 02:47:22.637123 493 621 W AudioSystem: AudioFlinger server died!
09-29 02:47:22.079547 393 393 E HwcComposer: executeCommands failed because of Status(EX_TRANSACTION_FAILED): 'DEAD_OBJECT: '
09-29 02:47:22.114335 1020 1358 W SurfaceComposerClient: ComposerService remote (surfaceflinger) died [0xf24d2c10]
09-29 02:47:22.450807 323 411 E SystemControl: systemcontrol daemon client died cookie:0
09-29 02:47:22.451031 394 3624 E csound : [HIDLServer]:serviceDied, droid tvserver daemon a client died cookie:4
09-29 02:47:22.451104 394 3624 E csound : tvserver daemon client:4 died
09-29 02:47:22.451150 394 3624 E csound : handleServiceDeath, client size:5
而且上一个system_server(pid 777)最后的打印是为343创建墓碑文件,接着Zygote就重启了,所以大概率是这个系统服务的奔溃引发了整个系统的奔溃:
09-29 02:47:22.009985 777 996 W NativeCrashListener: Couldn't find ProcessRecord for pid 343
--------- switch to main
09-29 02:47:22.010605 291 291 E tombstoned: Tombstone written to: /data/tombstones/tombstone_08
--------- switch to system
09-29 02:47:22.017120 777 811 I BootReceiver: Copying /data/tombstones/tombstone_08 to DropBox (SYSTEM_TOMBSTONE)
09-29 02:47:22.018136 777 811 I DropBoxManagerService: add tag=SYSTEM_TOMBSTONE isTagEnabled=true flags=0x2
--------- switch to events
09-29 02:47:22.042284 777 811 I dropbox_file_copy: [/data/tombstones/tombstone_08,65536,SYSTEM_TOMBSTONE]
09-29 02:47:22.047667 777 811 I commit_sys_config_file: [log-files,5]
...
09-29 02:47:22.978333 4416 4416 D AndroidRuntime: >>>>>> START com.android.internal.os.ZygoteInit uid 0 <<<<<<
由于log里面并没有搜索到Zygote exit或者crash的信息,那么有可能是日志被冲掉了
还有可能是zygote被restart了,我们看这个进程的rc文件可以看他如果他重启的话会重启surfaceflinger:
service vendor.hwcomposer-2-4 /vendor/bin/hw/android.hardware.graphics.composer@2.4-service.droidlogic
class hal animation
user system
group graphics drmrpc
capabilities SYS_NICE
onrestart restart surfaceflinger
...
然后看surfaceflinger的rc文件发现它会重启zygote:
service surfaceflinger /system/bin/surfaceflinger
class core animation
user system
group graphics drmrpc readproc
capabilities SYS_NICE
onrestart restart zygote
...
所以死因就清晰了
- /vendor/bin/hw/android.hardware.graphics.composer@2.4-service.droidlogic出现野指针crash重启
- /vendor/bin/hw/android.hardware.graphics.composer@2.4-service.droidlogic重启的时候会重启surfaceflinger
- surfaceflinger重启的时候又会重启zygote
为了以后不再受这个卡死问题的困扰,让系统哥在zygote的rc文件里面配置zygote重启的时候把那个异常的服务也同步重启就可以了。
系统开机死掉导致开不了机
死法四: native奔溃导致卡logo
问题: 升级软件之后开机卡logo开不了机
出现这个问题首先要分析日志,但是串口直接logcat的话刷的太快不好排查所有些抓日志的小技巧:
- 使用重定向把日志导出到文件,例如/storage目录下:
logcat > /storage/log.log
然后就能用busybox vi去编辑查看了
- 插入u盘将日志文件导出到u盘(需要root)
由于开机没有成功,u盘可能还没有挂载上去,需要我们手动挂载。
首先需要用blkid命令列出所有文件系统,找到u盘(u盘名字就叫PTT)的设备节点为/dev/sda1:
console:/storage # blkid
/dev/zram0: UUID="12a21a37-1a08-42e6-97e3-90cb1a1ba60a" TYPE="swap"
/dev/mmcblk0p16: TYPE="squashfs"
/dev/mmcblk0p18: TYPE="squashfs"
/dev/mmcblk0p20: UUID="57f8f4bc-abf4-655f-bf67-946fc0f9f25b" TYPE="ext4"
/dev/mmcblk0p32: SEC_TYPE="msdos" UUID="5278-5278" TYPE="vfat"
/dev/mmcblk0p39: UUID="57f8f4bc-abf4-655f-bf67-946fc0f9f25b" TYPE="ext4"
/dev/block/mmcblk0p56: LABEL="/" UUID="5ac835d7-e53a-59f8-a2c4-b8a6967b849e" TYPE="ext4"
/dev/block/mmcblk0p58: LABEL="vendor" UUID="c7f6b4dc-c6f7-59d6-90cf-bc83aed55ec7" TYPE="ext4"
/dev/block/mmcblk0p60: UUID="cf81e7c0-047f-404a-81da-7d188dd0ccc0" TYPE="ext4"
/dev/sda1: LABEL="PTT" UUID="B4BE-1BCC" TYPE="vfat"
然后随便找个地方例如就在/storage,创建一个目录并且将u盘mount过去,接着将日志拷贝过去:
mkdir sda
mount /dev/sda1 sda/
cp /storage/log.log /storage/sda
sync
然后我们就能把u盘拔出来插到我们自己的电脑上分析日志了,从这个日志里面由于zygote还没启动成功,所以前面用的"Exit zygote"关键字是找不到日志的,但是能看到一直在报native层的堆栈。
创建java虚拟机的时候找不到libaccelerator_base.so触发断言导致系统奔溃:
09-22 11:34:20.119 2233 2233 I tombstoned: received crash request for pid 3174
09-22 11:34:20.128 3209 3209 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
09-22 11:34:20.129 3209 3209 F DEBUG : Build fingerprint: 'cvt/mt9950_cn/mt9950_cn:11/RP1A.200720.011/6182:user/release-keys'
09-22 11:34:20.129 3209 3209 F DEBUG : Revision: '0'
09-22 11:34:20.129 3209 3209 F DEBUG : ABI: 'arm64'
09-22 11:34:20.129 3209 3209 F DEBUG : Timestamp: 2022-09-22 11:34:20+0800
09-22 11:34:20.130 3209 3209 F DEBUG : pid: 3174, tid: 3174, name: main >>> zygote64 <<<
09-22 11:34:20.130 3209 3209 F DEBUG : uid: 0
09-22 11:34:20.130 3209 3209 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
09-22 11:34:20.130 3209 3209 F DEBUG : Abort message: 'Error preloading public library libaccelerator_base.so: dlopen failed: library "libaccelerator_base.so" not found'
09-22 11:34:20.130 3209 3209 F DEBUG : x0 0000000000000000 x1 0000000000000c66 x2 0000000000000006 x3 0000007fe9f17ef0
09-22 11:34:20.130 3209 3209 F DEBUG : x4 0000007c1de1e000 x5 0000007c1de1e000 x6 0000007c1de1e000 x7 000000000000168c
09-22 11:34:20.130 3209 3209 F DEBUG : x8 00000000000000f0 x9 0000007c192b17f8 x10 ffffff80fffffbdf x11 0000000000000001
09-22 11:34:20.130 3209 3209 F DEBUG : x12 0000000000000000 x13 0000000000000655 x14 0000007fe9f16d10 x15 00008b1aad54c6e4
09-22 11:34:20.130 3209 3209 F DEBUG : x16 0000007c1934ac80 x17 0000007c1932bf20 x18 0000007c1d608000 x19 00000000000000ac
09-22 11:34:20.130 3209 3209 F DEBUG : x20 0000000000000c66 x21 00000000000000b2 x22 0000000000000c66 x23 00000000ffffffff
09-22 11:34:20.130 3209 3209 F DEBUG : x24 0000007987636000 x25 0000000000000002 x26 0000007987013c07 x27 000000000000002b
09-22 11:34:20.130 3209 3209 F DEBUG : x28 0000007987638000 x29 0000007fe9f17f70
09-22 11:34:20.130 3209 3209 F DEBUG : lr 0000007c192df0c4 sp 0000007fe9f17ed0 pc 0000007c192df0f4 pst 0000000000000000
09-22 11:34:20.151 3209 3209 F DEBUG : backtrace:
09-22 11:34:20.151 3209 3209 F DEBUG : #00 pc 000000000004e0f4 /apex/com.android.runtime/lib64/bionic/libc.so (abort+180) (BuildId: c78cdff5b820a550771130d6bde95081)
09-22 11:34:20.151 3209 3209 F DEBUG : #01 pc 0000000000565bc8 /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+2320) (BuildId: a0e45eb7480266d293a7de84fc1c7a3c)
09-22 11:34:20.151 3209 3209 F DEBUG : #02 pc 0000000000013ab0 /system/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_3::__invoke(char const*)+80) (BuildId: 6d398535cd6d9315930f056432520bb9)
09-22 11:34:20.151 3209 3209 F DEBUG : #03 pc 0000000000006ec8 /system/lib64/liblog.so (__android_log_assert+336) (BuildId: c92329feece7a2d7fa4d9fb6acc815f9)
09-22 11:34:20.151 3209 3209 F DEBUG : #04 pc 000000000000ebcc /apex/com.android.art/lib64/libnativeloader.so (android::nativeloader::LibraryNamespaces::Initialize()+324) (BuildId: 4e6450569b3bdee211e32baf7d0dfba7)
09-22 11:34:20.151 3209 3209 F DEBUG : #05 pc 000000000000e044 /apex/com.android.art/lib64/libnativeloader.so (InitializeNativeLoader+36) (BuildId: 4e6450569b3bdee211e32baf7d0dfba7)
09-22 11:34:20.151 3209 3209 F DEBUG : #06 pc 000000000039066c /apex/com.android.art/lib64/libart.so (JNI_CreateJavaVM+732) (BuildId: a0e45eb7480266d293a7de84fc1c7a3c)
09-22 11:34:20.151 3209 3209 F DEBUG : #07 pc 00000000000a023c /system/lib64/libandroid_runtime.so (android::AndroidRuntime::startVm(_JavaVM**, _JNIEnv**, bool, bool)+9060) (BuildId: 88ac6961382cb34f5fac714acaf48103)
09-22 11:34:20.151 3209 3209 F DEBUG : #08 pc 00000000000a0890 /system/lib64/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+464) (BuildId: 88ac6961382cb34f5fac714acaf48103)
09-22 11:34:20.151 3209 3209 F DEBUG : #09 pc 0000000000003570 /system/bin/app_process64 (main+1320) (BuildId: d4686d3f8282764488eb9ca7cc518583)
09-22 11:34:20.151 3209 3209 F DEBUG : #10 pc 00000000000495b4 /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: c78cdff5b820a550771130d6bde95081)
09-22 11:34:20.192 3175 3175 F libc : Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 3175 (main), pid 3175 (main)
像这种native的奔溃,如果logcat里面定位不到根因的话可以分析/data/tombstones下的墓碑文件,里面的信息比较全:
console:/data/tombstones # ls
tombstone_00 tombstone_07 tombstone_14 tombstone_21 tombstone_28
tombstone_01 tombstone_08 tombstone_15 tombstone_22 tombstone_29
tombstone_02 tombstone_09 tombstone_16 tombstone_23 tombstone_30
tombstone_03 tombstone_10 tombstone_17 tombstone_24 tombstone_31
tombstone_04 tombstone_11 tombstone_18 tombstone_25
tombstone_05 tombstone_12 tombstone_19 tombstone_26
tombstone_06 tombstone_13 tombstone_20 tombstone_27
死法五: 系统应用签名错误导致系统崩溃开不了机
有时候我们在手动替换系统应用的时候会出现开不了机的情况,那么由于是替换完应用才出现问题的,所以我们可以直接接入串口过滤应用包名:
console:/storage # blkid logcat | grep me.linjw.demo
09-21 23:19:10.932 3611 3611 E AndroidRuntime: java.lang.IllegalStateException: Signature mismatch on system package me.linjw.demo for shared user SharedUserSetting{5a1fdd7 android.uid.system/1000}
从上面的信息就能很明显的看到是me.linjw.demo的签名错误导致抛出异常。
如果是编译系统的时候应用签名就错了,升级软件就开不了机,我们用前面的方法把logcat导出来,还是过滤下"Exit zygote"关键字,也是能看到日志的:
09-21 23:19:11.061 3541 3541 E Zygote : Exit zygote because system server (pid 3611) has terminated
同样再往上搜索3611进程的日志可以看到是PackageManagerService在开机的时候搜索所有安装的apk的时候触发到了应用系统签名错误的异常:
09-21 23:19:10.932 3611 3611 E AndroidRuntime: *** FATAL EXCEPTION IN SYSTEM PROCESS: main
09-21 23:19:10.932 3611 3611 E AndroidRuntime: java.lang.IllegalStateException: Signature mismatch on system package me.linjw.demo for shared user SharedUserSetting{5a1fdd7 android.uid.system/1000}
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.reconcilePackagesLocked(PackageManagerService.java:16568)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.addForInitLI(PackageManagerService.java:9537)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.scanDirLI(PackageManagerService.java:9131)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.scanDirTracedLI(PackageManagerService.java:9083)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.<init>(PackageManagerService.java:3111)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.pm.PackageManagerService.main(PackageManagerService.java:2599)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.SystemServer.startBootstrapServices(SystemServer.java:851)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.SystemServer.run(SystemServer.java:590)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.server.SystemServer.main(SystemServer.java:408)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
09-21 23:19:10.932 3611 3611 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:925)
09-21 23:19:10.933 3611 3611 E AndroidRuntime: Error reporting crash
09-21 23:19:10.933 3611 3611 E AndroidRuntime: java.lang.NullPointerException: Attempt to invoke interface method 'void android.app.IActivityManager.handleApplicationCrash(android.os.IBinder, android.app.ApplicationErrorReport$ParcelableCrashInfo)' on a null object reference
09-21 23:19:10.933 3611 3611 E AndroidRuntime: at com.android.internal.os.RuntimeInit$KillApplicationHandler.uncaughtException(RuntimeInit.java:158)
09-21 23:19:10.933 3611 3611 E AndroidRuntime: at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1073)
09-21 23:19:10.933 3611 3611 E AndroidRuntime: at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1068)
09-21 23:19:10.933 3611 3611 E AndroidRuntime: at java.lang.Thread.dispatchUncaughtException(Thread.java:2203)
09-21 23:19:10.933 3611 3611 I Process : Sending signal. PID: 3611 SIG: 9
09-21 23:19:11.061 3541 3541 E Zygote : Zygote failed to write to system_server FD: Connection refused
09-21 23:19:11.061 3541 3541 I Zygote : Process 3611 exited due to signal 9 (Killed)
09-21 23:19:11.061 3541 3541 E Zygote : Exit zygote because system server (pid 3611) has terminated
总结
1.如果发现system_server的进程号比较大,那么大概率重启过
2.可以用"Exit zygote"关键字去搜索找到system_server挂掉的时间,如果没有这个找不到这个关机字的话也能搜索下"START com.android.internal.os.ZygoteInit"看看重启的时间,然后往上搜索下面的关键字找具体死因:
- DEBUG : native 层出现异常的时候会有堆栈的打印
- crash : 奔溃信息打印
- kill : 一些异常触发系统被强杀
- tombstone : 虚拟机、c层的错误触发墓碑文件生成
- died : 某些服务死掉