一模一样的代码, 某几次编译的apk在'OPPO R9-android5.x老爷机' 上 启动就发生nativeCrash(后简称NE).
NE形式1:
========================================
2020-05-20 11:45:16 data_app_native_crash (text, 1519 bytes)
Process: com.duoxx.xx
Flags: 0xf83e44
Package: com.duoxx.xx v33283 (7.13.2-SNAPSHOT)
Build: OPPO/R9m/R9:5.1/LMY47I/1515760704:user/release-keys
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'OPPO/R9m/R9:5.1/LMY47I/1515760704:user/release-keys'
Revision: '0'
ABI: 'arm'
pid: 27544, tid: 27848, name: Thread-6488 >>> com.duoxx.xx <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x88
Abort message: 'art/runtime/gc/collector/mark_sweep.cc:387] Can't mark invalid object'
r0 00000000 r1 6fff6268 r2 0000001c r3 00000148
r4 75d79d60 r5 00000147 r6 c98f3a48 r7 f47feff4
r8 e0b72400 r9 ed30889a sl 00000007 fp f4889300
ip 00003072 sp c98f3890 lr f4584c77 pc f478c7f6 cpsr a00b0030
backtrace:
#00 pc 002967f6 /system/lib/libart.so (_ZN3art11interpreterL8DoInvokeILNS_10InvokeTypeE4ELb0ELb0EEEbPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+233)
#01 pc 0008ec73 /system/lib/libart.so (_ZN3art11interpreter15ExecuteGotoImplILb0ELb0EEENS_6JValueEPNS_6ThreadERNS_12MethodHelperEPKNS_7DexFile8CodeItemERNS_11ShadowFrameES2_+22762)
#02 pc 0016a173 /system/lib/libart.so (_ZN3art11interpreter24EnterInterpreterFromStubEPNS_6ThreadERNS_12MethodHelperEPKNS_7DexFile8CodeItemERNS_11ShadowFrameE+126)
#03 pc 00291a05 /system/lib/libart.so (artQuickToInterpreterBridge+468)
#04 pc 000a4aeb /system/lib/libart.so (art_quick_to_interpreter_bridge+10)
#05 pc 0099333c /dev/ashmem/dalvik-main space (deleted)
NE形式2:
========================================
2020-05-20 11:50:22 data_app_native_crash (text, 3195 bytes)
Process: com.duoxx.xx
Flags: 0xd83e44
Package: com.duoxx.xx v33283 (7.13.2-SNAPSHOT)
Build: OPPO/R9m/R9:5.1/LMY47I/1515760704:user/release-keys
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'OPPO/R9m/R9:5.1/LMY47I/1515760704:user/release-keys'
Revision: '0'
ABI: 'arm'
pid: 28717, tid: 28735, name: GCDaemon >>> com.duoxx.xx <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x14
Abort message: 'art/runtime/gc/collector/mark_sweep.cc:387] Can't mark invalid object'
r0 fffffa90 r1 f47feff4 r2 f459b9d0 r3 00000000
r4 00000000 r5 f382f0d0 r6 f47fca38 r7 f4889300
r8 d23848a0 r9 f47fca38 sl fffffa94 fp f4889300
ip f763c7c8 sp f382efa8 lr f4723f91 pc f459ba04 cpsr a0070010
backtrace:
#00 pc 000a5a04 /system/lib/libart.so (_ZN3art3arm10ArmContext15FillCalleeSavesERKNS_12StackVisitorE+52)
#01 pc 0022df8f /system/lib/libart.so (_ZN3art12StackVisitor9WalkStackEb+282)
#02 pc 00233dff /system/lib/libart.so (_ZNK3art6Thread13DumpJavaStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+166)
#03 pc 002354eb /system/lib/libart.so (_ZNK3art6Thread4DumpERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+182)
#04 pc 0023eabf /system/lib/libart.so (_ZN3art10ThreadList10DumpLockedERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+106)
#05 pc 00223cfb /system/lib/libart.so (_ZN3art10AbortState4DumpERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEE+242)
#06 pc 00223f4d /system/lib/libart.so (_ZN3art7Runtime5AbortEv+72)
#07 pc 000a8b71 /system/lib/libart.so (_ZN3art10LogMessageD1Ev+1296)
#08 pc 0012ebcb /system/lib/libart.so (_ZN3art2gc10accounting10HeapBitmap3SetINS0_9collector27MarkSweepMarkObjectSlowPathEEEbPKNS_6mirror6ObjectERKT_+294)
#09 pc 0006cb45 /system/lib/libart.so (_ZN3art2gc9collector9MarkSweep17MarkObjectNonNullEPNS_6mirror6ObjectE.part.155+20)
#10 pc 00130313 /system/lib/libart.so (_ZN3art6mirror6Object21VisitFieldsReferencesILb0ELb0ENS_2gc9collector17MarkObjectVisitorEEEvjRKT1_+118)
#11 pc 001303eb /system/lib/libart.so (_ZN3art6mirror6Object15VisitReferencesILb0ELNS_17VerifyObjectFlagsE0ENS_2gc9collector17MarkObjectVisitorENS5_29DelayReferenceReferentVisitorEEEvRKT1_RKT2_+146)
#12 pc 00130b37 /system/lib/libart.so (_ZNK3art2gc10accounting11SpaceBitmapILj8EE16VisitMarkedRangeINS0_9collector17ScanObjectVisitorEEEvjjRKT_+430)
#13 pc 0013111d /system/lib/libart.so (_ZN3art2gc9collector9MarkSweep15ScanGrayObjectsEbh+984)
#14 pc 0013119f /system/lib/libart.so (_ZN3art2gc9collector9MarkSweep25RecursiveMarkDirtyObjectsEbh+6)
#15 pc 00131519 /system/lib/libart.so (_ZN3art2gc9collector9MarkSweep12MarkingPhaseEv+112)
#16 pc 0013167f /system/lib/libart.so (_ZN3art2gc9collector9MarkSweep9RunPhasesEv+310)
#17 pc 00128ba5 /system/lib/libart.so (_ZN3art2gc9collector16GarbageCollector3RunENS0_7GcCauseEb+220)
#18 pc 00146e7f /system/lib/libart.so (_ZN3art2gc4Heap22CollectGarbageInternalENS0_9collector6GcTypeENS0_7GcCauseEb+1330)
#19 pc 00148303 /system/lib/libart.so (_ZN3art2gc4Heap12ConcurrentGCEPNS_6ThreadE+110)
#20 pc 000003df /data/dalvik-cache/arm/system@framework@boot.oat
诡异的, 重新打包 有概率性正常.
测试人员首先报来的问题是"启动ANR, 然后闪退"
看了现场 缺实有ANR, 但是ANR的trace文件为空
$ adb shell " ls -al /data/anr/"
...trace文件 0kb / trace文件为空
这种抓trace失败的ANR, 一般不是大家一般关注的java层的某些代码导致的ANR.
所以不要浪费精力 按照一般ANR的套路去分析.
这里基于我司的长期经验, 是我们使用google-breakpad库收集nativeCrash失败, 导致进程僵死, 进而发生ANR.
即问题的rootCase 是nativeCrash, 以及nativeCrash抓取失败.
因为是必现的, 所以关闭了google-breakpad, 让系统抓到到如上贴的 crash栈信息.
问题的表现也回归单纯的启动闪退, 而不是ANR.
ps: 还有一些小细节 顺道说下:
1> andorid 5.x以及以前的日志打印格式 没有打印tid和时间, 不友好, logcat的时候加上 -v threadtime, 让打印格式更友好.
2> andorid 5.x以及以前的 没有"logcat -b crash", 不方便快速查看 crash日志.
ps: 高版本上adb logcat -b all 在<=5.x也没有, 应该用"adb logcat -b main -b system -b events -v threadtime" 顶一顶.
3> 上述的 "F DEBUG"的日志打印,在这台OPPO手机上居然并不会在logcat里打印.logcat只打印了'F libc ..." 的NE日志.
上述NE栈是通过 dropbox 即" adb shell dumpsys dropbox --print " 的方式获取的.