问题现象
最近处理客户反馈一个native crash问题.
- 复现步骤
- 手机(Android 6.0)
- com.qihoo.browser应用低概率出现native crash
定位分析
- 相关log
- tombstone
Revision: '0' ABI: 'arm' pid: 13055, tid: 13117, name: ss >>> com.qihoo.browser:loader0 <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xffffffc8 r0 abcbcc40 r1 ffffffc8 r2 00000008 r3 b4d38c00 r4 afa6b000 r5 b00ff820 r6 00000001 r7 00000000 r8 afa6b000 r9 b4dac800 sl b4d47420 fp b4d3f384 ip ab528018 sp a2057718 lr b4c27037 pc ffffffc8 cpsr 800f0010 d0 0000000000000000 d1 0000000000000000 d2 0000000000000000 d3 0000000000000000 d4 0000000000000000 d5 0000000000000000 d6 0000000000000000 d7 a205739400000000 d8 0000000000000000 d9 0000000000000000 d10 0000000000000000 d11 0000000000000000 d12 0000000000000000 d13 0000000000000000 d14 0000000000000000 d15 0000000000000000 d16 0000000000000000 d17 ab535000abf271d4 d18 abf2700cafa5ae00 d19 0000500000005000 d20 0000000000000000 d21 0000000000000000 d22 0000000000000000 d23 0000000000001000 d24 0000010200000102 d25 0000000000000102 d26 0000000000001000 d27 0000015e816a8ca5 d28 0000000000000001 d29 0000000000000001 d30 0000000000000002 d31 00000000003d0909 scr 80000010 backtrace: #00 pc ffffffc8 <unknown> #01 pc 00338035 /system/lib/libart.so (_ZN3art6ThreadD1Ev+228) #02 pc 003478b9 /system/lib/libart.so (_ZN3art10ThreadList10UnregisterEPNS_6ThreadE+232) #03 pc 00340411 /system/lib/libart.so (_ZN3art6Thread14CreateCallbackEPv+600) #04 pc 0003f87b /system/lib/libc.so (_ZL15__pthread_startPv+30) #05 pc 00019f95 /system/lib/libc.so (__start_thread+6)
- tombstone
初步分析
- 反汇编相关so库
$ arm-linux-androideabi-addr2line -f -e /tim.zhang/tim_share/bugzilla/201709/749779/symbols/libart.so 00338035 003478b9 00340411
_ZN3art6ThreadD2Ev
/art/runtime/thread.cc:1449 (discriminator 1)
_ZN3art10ThreadList10UnregisterEPNS_6ThreadE
/art/runtime/thread_list.cc:1151 (discriminator 1)
_ZN3art6Thread14CreateCallbackEPv
/art/runtime/thread.cc:282 (discriminator 2)
- 代码分析
-
#1相关的code
Thread::~Thread() { ... delete wait_mutex_; ... }
-
#1对应的反汇编code
... ... 33802a: f8d4 0454 ldr.w r0, [r4, #1108] ; 0x454 33802e: b110 cbz r0, 338036 <_ZN3art6ThreadD1Ev+0xe6> 338030: 6803 ldr r3, [r0, #0] 338032: 6919 ldr r1, [r3, #16] 338034: 4788 blx r1 ==> crash ... ...
-
分析
- wait_mutex_ = r0 = abcbcc40
vptr = 0xb4d38c00abcbcc40 b4d38c00 0000001d b4d177a0 65722e6b .........w..k.re
- 查看vtable
可看到,vtable中出现了多个非法函数指针.b4d38c00 00000000 b49e12bd b49e1391 ffffffc8 ................ b4d38c10 ffffffc8 00000000 b49e1325 b49e13fd ........%.......
- 第1个虚函数art::Mutex::IsMutex变成了0x00000000
- 第4和第5个虚函数art::Mutex::~Mutex()变成了0xffffffc8
- wait_mutex_ = r0 = abcbcc40
-
对比实验
- Android 6.0
- wait_mutex_:
(gdb) x /8xw 0xb7c3c2e0 0xb7c3c2e0: 0xb4e6ac10 0x0000001d 0xb4e49404 0x0000fe01 0xb7c3c2f0: 0x00000000 0x00000041 0x00000000 0x00000000
- 查看vtable
(gdb) x /8xw 0xb4e6ac10 0xb4e6ac10 <_ZTVN3art5MutexE+8>: 0xb4b16965 0xb4b1314d 0xb4b1736d 0xb4b171ed 0xb4e6ac20 <_ZTVN3art5MutexE+24>: 0xb4b18a45 0x00000000 0x00000000 0x00000000
- Nexus 6p(Android 8.0)
- 汇编code
0x00000074da26e1b4 <+224>: ldr x0, [x19,#2432] 0x00000074da26e1b8 <+228>: cbz x0, 0x74da26e1c8 <art::Thread::~Thread()+244> 0x00000074da26e1bc <+232>: ldr x8, [x0] 0x00000074da26e1c0 <+236>: ldr x8, [x8,#48] 0x00000074da26e1c4 <+240>: blr x8
- wait_mutex_
(gdb) x /4xg 0x00000074da521c00 0x74da521c00: 0x00000074da3dc308 0x0000007400000024 0x74da521c10: 0x00000074da361191 0x00000074da51f000
- 查看vtable
(gdb) x /32xg 0x00000074da3dc308 0x74da3dc308 <_ZTVN3art5MutexE+16>: 0x00000074d9ee6068 0x00000074d9ee16bc 0x74da3dc318 <_ZTVN3art5MutexE+32>: 0x00000074d9ee16bc 0x00000074d9ee3a58 0x74da3dc328 <_ZTVN3art5MutexE+48>: 0x00000074d9ee3b58 0x00000074d9ee3004 0x74da3dc338 <_ZTVN3art5MutexE+64>: 0x00000074d9ee323c 0x0000000000000000
- 汇编code
Root Cause
0xb4d38c00 位于libart.so的虚拟地址空间的只读部分
b48fc000-b4d31fff r-x 0 436000 /system/lib/libart.so (BuildId: 0a4c3feb37d9d2e8f30f11df6163908d) (load base 0xd000)
b4d32000-b4d32fff --- 0 1000
b4d33000-b4d3cfff r-- 436000 a000 /system/lib/libart.so
b4d3d000-b4d3dfff rw- 440000 1000 /system/lib/libart.so
目前来看,软件内存篡改可能性不大,硬件(如memory)出问题可能性大.需要关注单个硬件或同一批次硬件是否出现内存相关问题.