工欲善其事,必先利其器。本文主要介绍linux下crash工具常用命令的功能和使用。
背景知识
crash是redhat的工程师开发的,主要用来离线分析linux内核转存文件,它整合了gdb工具,功能非常强大。可以查看堆栈,dmesg日志,内核数据结构,反汇编等等。crash支持多种工具生成的转存文件格式,如kdump,LKCD,netdump和diskdump,而且还可以分析虚拟机Xen和Kvm上生成的内核转存文件。同时crash还可以调试运行时系统,直接运行crash即可,ubuntu下内核映象存放在/proc/kcore。
crash和linux内核是紧密耦合的,会随着内核的变化持续更新,它向前兼容的,新的crash工具可以分析老内核的转存文件。如果你的内核版本较新,crash无法解析,可以尝试安装最新的crash工具。
常用命令
下面介绍常用命令的使用,主要参考了crash_whitepaper和crash工具自带的帮助文档。crash_whitepaper介绍了开发的初衷,编译,命令的分类和使用以及如何添加自己的命令,是一个非常好的参考文献。我用的版本是crash-7.2.6和gdb-7.6,使用时可以使用“help command”来查看详细的帮助文档,详细的命令列表见附件。
crash在加载内核转存文件是会输出系统基本信息,如出问题的进程(bash - 2613),系统内存大小(7.9GB),系统架构(x86_64)等等,可以看到这个dump是sysrq触发的一个panic系统崩溃。
KERNEL: ../kernel-src/linux-4.19.53/vmlinux
DUMPFILE: crash/201907070732/dump.201907070732 [PARTIAL DUMP]
CPUS: 4
DATE: Sun Jul 7 07:31:34 2019
UPTIME: 00:10:27
LOAD AVERAGE: 0.14, 0.16, 0.12
TASKS: 584
NODENAME: glbian-OptiPlex-990
RELEASE: 4.19.53
VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019
MACHINE: x86_64 (3292 Mhz)
MEMORY: 7.9 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 2613
COMMAND: "bash"
TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)
查看堆栈
一般可以先查看堆栈(bt),看看系统死在什么地方,进而确定调查方向。可以看到这个dump的异常发生在sysrq的处理函数里面。
crash> bt
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
'#0 [ffffa0f442cd7a08] machine_kexec at ffffffff99a69313
'#1 [ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9
'#2 [ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441
'#3 [ffffa0f442cd7b50] oops_end at ffffffff99a32bed
'#4 [ffffa0f442cd7b78] no_context at ffffffff99a7997c
'#5 [ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15
'#6 [ffffa0f442cd7c20] bad_area at ffffffff99a79f86
'#7 [ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486
'#8 [ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d
'#9 [ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286
RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007
R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
'#10 [ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8
'#11 [ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf
... ...
另外可以加参数显示函数偏移,函数所在的文件和每一帧的具体内容,从而对照源码和汇编代码,查看函数入参和局部变量。
crash> bt -slf
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
'#0 [ffffa0f442cd7a08] machine_kexec+451 at ffffffff99a69313
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/kernel/machine_kexec_64.c: 346
ffffa0f442cd7a10: 0000a0f442cd7a50 ffff8b7c40000000
ffffa0f442cd7a20: 0000000024001000 ffff8b7c64001000
ffffa0f442cd7a30: 0000000024000000 a05cedc0dfb99200
ffffa0f442cd7a40: a05cedc0dfb99200 ffffa0f442cd7cf8
ffffa0f442cd7a50: 0000000000000009 ffffa0f442cd7cf8
ffffa0f442cd7a60: ffffa0f442cd7b28 ffffffff99b3e6b9
... ...
’#8 [ffffa0f442cd7cc0] do_page_fault+45 at ffffffff99a7a60d
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/mm/fault.c: 1470
ffffa0f442cd7cc8: ffff8b7e6500d140 0000000000000000
ffffa0f442cd7cd8: 0000000000000000 0000000000000000
ffffa0f442cd7ce8: ffffa0f442cd7cf9 ffffffff9a6010ae
'#9 [ffffa0f442cd7cf0] page_fault+30 at ffffffff9a6010ae
/home/glbian/data/kernel-src/linux-4.19.53/arch/x86/entry/entry_64.S: 1181
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff9a034066 RSP: ffffa0f442cd7da8 RFLAGS: 00010286
RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
RBP: ffffa0f442cd7da8 R8: 00000000000002f2 R9: 0000000000000007
R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 147
ffffa0f442cd7cf8: ffff8b7de5af9100 ffffffff9afa7300
ffffa0f442cd7d08: 0000000000000000 0000000000000004
ffffa0f442cd7d18: ffffa0f442cd7da8 0000000000000063
ffffa0f442cd7d28: ffffffff9b39c3ed 0000000000000000
ffffa0f442cd7d38: 0000000000000007 00000000000002f2
ffffa0f442cd7d48: ffffffff9a034050 0000000000000006
ffffa0f442cd7d58: 0000000000000000 0000000000000096
ffffa0f442cd7d68: 0000000000000063 ffffffffffffffff
ffffa0f442cd7d78: ffffffff9a034066 0000000000000010
ffffa0f442cd7d88: 0000000000010286 ffffa0f442cd7da8
ffffa0f442cd7d98: 0000000000000018 0000000000000000
ffffa0f442cd7da8: ffffa0f442cd7dd8 ffffffff9a0347e8
'#10 [ffffa0f442cd7db0] __handle_sysrq+136 at ffffffff9a0347e8
/home/glbian/data/kernel-src/linux-4.19.53/drivers/tty/sysrq.c: 583
ffffa0f442cd7db8: 0000000000000002 fffffffffffffffb
ffffa0f442cd7dc8: ffffa0f442cd7ee8 0000563d45717780
ffffa0f442cd7dd8: ffffa0f442cd7df0 ffffffff9a034cbf
... ...
可以用dis命令进行返汇编,查看对应地址的代码逻辑。
>crash> dis -r ffffffff9a6010ae
0xffffffff9a601090 <page_fault>: data32 xchg %ax,%ax
0xffffffff9a601093 <page_fault+3>: callq 0xffffffff9a601230 <error_entry>
0xffffffff9a601098 <page_fault+8>: mov %rsp,%rdi
0xffffffff9a60109b <page_fault+11>: mov 0x78(%rsp),%rsi
0xffffffff9a6010a0 <page_fault+16>: movq $0xffffffffffffffff,0x78(%rsp)
0xffffffff9a6010a9 <page_fault+25>: callq 0xffffffff99a7a5e0 <do_page_fault>
0xffffffff9a6010ae <page_fault+30>: jmpq 0xffffffff9a601330 <error_exit>
>crash> dis -f ffffffff9a6010ae
0xffffffff9a6010ae <page_fault+30>: jmpq 0xffffffff9a601330 <error_exit>
0xffffffff9a6010b3 <page_fault+35>: nopl (%rax)
0xffffffff9a6010b6 <page_fault+38>: nopw %cs:0x0(%rax,%rax,1)
有时会出现堆栈被破坏的情况,可以用-t/-T来把整个stack的信息dump出来,往往可以看到一些蛛丝马迹。
crash> bt -t
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
START: machine_kexec at ffffffff99a69313
[ffffa0f442cd7a08] machine_kexec at ffffffff99a69313
[ffffa0f442cd7a68] __crash_kexec at ffffffff99b3e6b9
[ffffa0f442cd7ac0] sysrq_handle_crash at ffffffff9a034050
[ffffa0f442cd7af0] sysrq_handle_crash at ffffffff9a034066
[ffffa0f442cd7b30] crash_kexec at ffffffff99b3f441
[ffffa0f442cd7b38] __die at ffffffff99a33375
[ffffa0f442cd7b50] oops_end at ffffffff99a32bed
[ffffa0f442cd7b78] no_context at ffffffff99a7997c
[ffffa0f442cd7bd8] __bad_area_nosemaphore at ffffffff99a79d15
[ffffa0f442cd7c20] bad_area at ffffffff99a79f86
[ffffa0f442cd7c48] __do_page_fault at ffffffff99a7a486
[ffffa0f442cd7cc0] do_page_fault at ffffffff99a7a60d
[ffffa0f442cd7cf0] page_fault at ffffffff9a6010ae
[ffffa0f442cd7d48] sysrq_handle_crash at ffffffff9a034050
[ffffa0f442cd7d78] sysrq_handle_crash at ffffffff9a034066
[ffffa0f442cd7db0] __handle_sysrq at ffffffff9a0347e8
[ffffa0f442cd7de0] write_sysrq_trigger at ffffffff9a034cbf
[ffffa0f442cd7df8] proc_reg_write at ffffffff99d2a0ee
[ffffa0f442cd7e18] __vfs_write at ffffffff99ca8a0a
[ffffa0f442cd7e40] apparmor_file_permission at ffffffff99e53a0a
[ffffa0f442cd7e50] security_file_permission at ffffffff99e06cf1
[ffffa0f442cd7e78] _cond_resched at ffffffff9a4153f9
[ffffa0f442cd7ea0] vfs_write at ffffffff99ca8d11
[ffffa0f442cd7ed8] ksys_write at ffffffff99ca8fcc
[ffffa0f442cd7f20] __x64_sys_write at ffffffff99ca906a
[ffffa0f442cd7f30] do_syscall_64 at ffffffff99a0428a
[ffffa0f442cd7f50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ff47e1ef154 RSP: 00007ffee9226298 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154
RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001
RBP: 0000563d45717780 R8: 000000000000000a R9: 0000000000000001
R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760
R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
默认bt会dump问题线程的场景,还可以用bt -a/-c查看所有当前CPU或指定cpu的堆栈。
crash> bt -c 1
PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"
'#0 [fffffe0000034e38] crash_nmi_callback at ffffffff99a5d3d7
'#1 [fffffe0000034e48] nmi_handle at ffffffff99a33691
... ...
'#12 [ffffa0f440cd7f50] secondary_startup_64 at ffffffff99a000d4
crash> bt -a
PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"
... ...
PID: 0 TASK: ffff8b7e64165c00 CPU: 1 COMMAND: "swapper/1"
... ...
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
... ...
PID: 0 TASK: ffff8b7e642c4500 CPU: 3 COMMAND: "swapper/3"
... ...
也可以用set命令来改变线程环境,从而查看别的cpu上的堆栈情况。
crash> set 1
PID: 1
COMMAND: "systemd"
TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]
CPU: 3
STATE: TASK_INTERRUPTIBLE
crash> bt
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7
'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c
'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691
'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3
'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941
'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0
'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e
'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a
'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7
RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004
RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79
R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb
R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
系统日志
log命令可以用来查看系统的日志,“log -a”可以读取还没有从内核日志缓存到用户空间日志缓存的日志。
也可以重定向到文件(log > logfile)。
crash> log
... ...
[ 1610.759133] sysrq: SysRq : Trigger a crash
[ 1610.759147] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 1610.759150] PGD 0 P4D 0
[ 1610.759154] Oops: 0002 [#1] SMP PTI
[ 1610.759159] CPU: 2 PID: 2613 Comm: bash Kdump: loaded Not tainted 4.19.53 #1
[ 1610.759161] Hardware name: Dell Inc. OptiPlex 990/0RVG2C, BIOS A13 04/02/2012
[ 1610.759167] RIP: 0010:sysrq_handle_crash+0x16/0x20
[ 1610.759170] Code: e8 9f fb ff ff e9 c0 fe ff ff 90 90 90 90 90 90 90 90 90 90 66 66 66 66 90 55 48 89 e5 c7 05 85 10 36 01 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 c7 05 40 fa e2 00
[ 1610.759173] RSP: 0018:ffffa0f442cd7da8 EFLAGS: 00010286
[ 1610.759176] RAX: ffffffff9a034050 RBX: 0000000000000063 RCX: 0000000000000006
[ 1610.759178] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000063
[ 1610.759180] RBP: ffffa0f442cd7da8 R08: 00000000000002f2 R09: 0000000000000007
[ 1610.759182] R10: 0000000000000000 R11: ffffffff9b39c3ed R12: 0000000000000004
[ 1610.759184] R13: 0000000000000000 R14: ffffffff9afa7300 R15: ffff8b7de5af9100
[ 1610.759186] FS: 00007ff47eb0a740(0000) GS:ffff8b7e65880000(0000) knlGS:0000000000000000
[ 1610.759189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1610.759191] CR2: 0000000000000000 CR3: 0000000205db0003 CR4: 00000000000606e0
[ 1610.759193] Call Trace:
[ 1610.759199] __handle_sysrq+0x88/0x140
[ 1610.759203] write_sysrq_trigger+0x2f/0x40
[ 1610.759208] proc_reg_write+0x3e/0x60
[ 1610.759212] __vfs_write+0x3a/0x190
[ 1610.759216] ? apparmor_file_permission+0x1a/0x20
[ 1610.759220] ? security_file_permission+0x31/0xc0
[ 1610.759224] ? _cond_resched+0x19/0x40
[ 1610.759226] vfs_write+0xb1/0x1a0
[ 1610.759229] ksys_write+0x5c/0xe0
[ 1610.759232] __x64_sys_write+0x1a/0x20
[ 1610.759237] do_syscall_64+0x5a/0x120
[ 1610.759241] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1610.759245] RIP: 0033:0x7ff47e1ef154
[ 1610.759247] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 b1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[ 1610.759249] RSP: 002b:00007ffee9226298 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1610.759252] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff47e1ef154
[ 1610.759254] RDX: 0000000000000002 RSI: 0000563d45717780 RDI: 0000000000000001
[ 1610.759256] RBP: 0000563d45717780 R08: 000000000000000a R09: 0000000000000001
[ 1610.759258] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff47e4cb760
[ 1610.759260] R13: 0000000000000002 R14: 00007ff47e4c72a0 R15: 00007ff47e4c6760
[ 1610.759263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcbc aesni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep aes_x86_64 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi input_leds crypto_simd cryptd snd_seq snd_seq_device snd_timer dcdbas snd glue_helper intel_cstate intel_rapl_perf lpc_ich serio_raw soundcore sch_fq_codel mei_me mei mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage i915 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass i2c_algo_bit cec rc_core drm_kms_helper psmouse syscopyarea sysfillrect video sysimgblt fb_sys_fops ahci drm libahci e1000e
[ 1610.759320] CR2: 0000000000000000
查看数据结构
struct和union可以用来查看结构体和共用体,用法相同,下面看一些struct
打印的例子。把指定地址的内容以task_struct结构体解析打印,如果不带地址会显示结构体定义和大小。
1 打印task_struct结构体
crash> task_struct ffff8b7df3cdae00 -x
struct task_struct {
thread_info = {
flags = 0x80000000,
status = 0x0
},
state = 0x0,
stack = 0xffffa0f442cd4000,
usage = {
counter = 0x2
},
... ...
2 打印task_struct定义和大小。
struct task_struct {
[0x0] struct thread_info thread_info;
[0x10] volatile long state;
[0x18] void *stack;
... ...
[0x1288] void *security;
[0x12c0] struct thread_struct thread;
}
SIZE: 0x23c0
3 查看成员变量
crash> task_struct.stack_refcount ffff8b7df3cdae00 -xo
struct task_struct {
[ffff8b7df3cdc080] atomic_t stack_refcount;
}
4 查看指针成员变量
crash> task_struct.mm ffff8b7df3cdae00
mm = 0xffff8b7e5af06600
crash> task_struct.mm ffff8b7df3cdae00 -p
struct mm_struct *mm = 0xffff8b7e5af06600
-> {
{
mmap = 0xffff8b7dec0520c8,
mm_rb = {
rb_node = 0xffff8b7dec003b78
},
vmacache_seqnum = 17,
get_unmapped_area = 0xffffffff99a35760,
此外还可以查看数组内容,per-cpu变量,以及其他一些功能,详细可参考帮助文档。
查看和搜索内存
除了打印数据结构,有时需要查看和搜索内存内容,看有没有制定的数据模式。
1 查看系统版本信息
crash> rd -a linux_banner
ffffffff9aa00100: Linux version 4.19.53 (glbian@glbian-OptiPlex-990) (gcc vers
ffffffff9aa0013c: ion 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #1 SMP Sun Jun 23
ffffffff9aa00178: 11:01:25 CST 2019
- 查看内存内容
crash> rd ffffa0f442cd7a08 32
ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|..........
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....
ffffa0f442cd7a68: ffffffff99b3e6b9 ffff8b7de5af9100 ............}...
ffffa0f442cd7a78: ffffffff9afa7300 0000000000000000 .s..............
ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8 .........}.B....
ffffa0f442cd7a98: 0000000000000063 ffffffff9b39c3ed c.........9.....
ffffa0f442cd7aa8: 0000000000000000 0000000000000007 ................
ffffa0f442cd7ab8: 00000000000002f2 ffffffff9a034050 ........P@......
ffffa0f442cd7ac8: 0000000000000006 0000000000000000 ................
ffffa0f442cd7ad8: 0000000000000096 0000000000000063 ........c.......
ffffa0f442cd7ae8: ffffffffffffffff ffffffff9a034066 ........f@......
ffffa0f442cd7af8: 0000000000000010 0000000000010286 ................
3 打印符号表
crash> rd ffffa0f442cd7a08 32 -s
ffffa0f442cd7a08: machine_kexec+451 0000a0f442cd7a50
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000
ffffa0f442cd7a28: ffff8b7c64001000 0000000024000000
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28
ffffa0f442cd7a68: __crash_kexec+105 ffff8b7de5af9100
ffffa0f442cd7a78: sysrq_crash_op 0000000000000000
ffffa0f442cd7a88: 0000000000000004 ffffa0f442cd7da8
ffffa0f442cd7a98: 0000000000000063 text.45672+13
ffffa0f442cd7aa8: 0000000000000000 0000000000000007
ffffa0f442cd7ab8: 00000000000002f2 sysrq_handle_crash
ffffa0f442cd7ac8: 0000000000000006 0000000000000000
ffffa0f442cd7ad8: 0000000000000096 0000000000000063
ffffa0f442cd7ae8: ffffffffffffffff sysrq_handle_crash+22
ffffa0f442cd7af8: 0000000000000010 0000000000010286
4 查看指定内存区域内容
crash> rd ffffa0f442cd7a08 -e ffffa0f442cd7a68
ffffa0f442cd7a08: ffffffff99a69313 0000a0f442cd7a50 ........Pz.B....
ffffa0f442cd7a18: ffff8b7c40000000 0000000024001000 ...@|..........
ffffa0f442cd7a38: a05cedc0dfb99200 a05cedc0dfb99200 ..............
ffffa0f442cd7a48: ffffa0f442cd7cf8 0000000000000009 .|.B............
ffffa0f442cd7a58: ffffa0f442cd7cf8 ffffa0f442cd7b28 .|.B....({.B....
5 搜索指定内存
crash> search -s ffffa0f442cd7a08 -e ffffa0f442cd7db0 ffffffff9b39c3ed
ffffa0f442cd7aa0: ffffffff9b39c3ed
ffffa0f442cd7d28: ffffffff9b39c3ed
6 搜索匹配数据
crash> search -p babe0000 -m ffff
1c4cc6530: babec685
21f7d35b8: babe4550
crash>
查看线程状态
1 查看所有线程状态
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd
2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]
2 查看父线程树
crash> ps -p 2613
PID: 0 TASK: ffffffff9ae13740 CPU: 0 COMMAND: "swapper/0"
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
PID: 1081 TASK: ffff8b7e5dc81700 CPU: 1 COMMAND: "gdm3"
PID: 2114 TASK: ffff8b7e584f2e00 CPU: 0 COMMAND: "gdm-session-wor"
PID: 2136 TASK: ffff8b7e63cc4500 CPU: 1 COMMAND: "gdm-x-session"
PID: 2149 TASK: ffff8b7e5dfaae00 CPU: 0 COMMAND: "gnome-session-b"
PID: 2254 TASK: ffff8b7e5e04dc00 CPU: 0 COMMAND: "gnome-shell"
PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"
PID: 2611 TASK: ffff8b7df3f8ae00 CPU: 0 COMMAND: "sudo"
PID: 2612 TASK: ffff8b7dec3b9700 CPU: 3 COMMAND: "su"
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
3 查看子线程
crash> ps -c 2582
PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
PID: 2592 TASK: ffff8b7dec05ae00 CPU: 1 COMMAND: "bash"
PID: 2600 TASK: ffff8b7df3f88000 CPU: 0 COMMAND: "bash"
PID: 2787 TASK: ffff8b7df9f80000 CPU: 3 COMMAND: "bash"
4 查看线程运行时间
crash> ps -t 2613
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
RUN TIME: 00:00:00
START TIME: 1296209749767
UTIME: 36000000
STIME: 16000000
5 查看活动线程
crash> ps -A
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
2613 2612 2 ffff8b7df3cdae00 RU 0.0 28708 4352 bash
6 查看内核线程
crash> ps -k
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9ae13740 RU 0.0 0 0 [swapper/0]
0 0 1 ffff8b7e64165c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffff8b7e64162e00 RU 0.0 0 0 [swapper/2]
0 0 3 ffff8b7e642c4500 RU 0.0 0 0 [swapper/3]
2 0 2 ffff8b7e64138000 IN 0.0 0 0 [kthreadd]
7 查看用户态线程
crash> ps -u
PID PPID CPU TASK ST %MEM VSZ RSS COMM
1 0 3 ffff8b7e6413c500 IN 0.1 225916 9716 systemd
298 1 3 ffff8b7e5879c500 IN 0.4 126508 38028 systemd-journal
318 1 0 ffff8b7e584f5c00 IN 0.1 48004 6360 systemd-udevd
822 1 2 ffff8b7e59c71700 IN 0.1 70756 6176 systemd-resolve
824 1 2 ffff8b7e586e5c00 IN 0.1 146108 5540 systemd-timesyn
834 1 3 ffff8b7e63881700 IN 0.1 146108 5540 sd-resolve
863 1 3 ffff8b7e5d790000 IN 0.1 51612 6112 dbus-daemon
864 1 1 ffff8b7e5d794500 IN 0.1 427264 9404 ModemManager
8 查看最后运行时间戳
crash> ps -l
[1610759003323] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
[1610758998404] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"
[1610758938747] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
[1610758009873] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"
crash> ps -m
[0 00:00:00.000] [IN] PID: 2582 TASK: ffff8b7dec3bae00 CPU: 0 COMMAND: "terminator"
[0 00:00:00.000] [ID] PID: 211 TASK: ffff8b7e585aae00 CPU: 3 COMMAND: "kworker/u32:5"
[0 00:00:00.000] [RU] PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
[0 00:00:00.000] [IN] PID: 2587 TASK: ffff8b7e06cd5c00 CPU: 2 COMMAND: "gdbus"
[0 00:00:00.001] [IN] PID: 2138 TASK: ffff8b7e26801700 CPU: 0 COMMAND: "Xorg"
9 查看线程资源限制
crash> ps -r 2613
PID: 2613 TASK: ffff8b7df3cdae00 CPU: 2 COMMAND: "bash"
RLIMIT CURRENT MAXIMUM
CPU (unlimited) (unlimited)
FSIZE (unlimited) (unlimited)
DATA (unlimited) (unlimited)
STACK 8388608 (unlimited)
CORE 0 (unlimited)
RSS (unlimited) (unlimited)
NPROC 30393 30393
NOFILE 1024 1048576
MEMLOCK 16777216 16777216
AS (unlimited) (unlimited)
LOCKS (unlimited) (unlimited)
SIGPENDING 30393 30393
MSGQUEUE 819200 819200
NICE 0 0
RTPRIO 0 0
RTTIME (unlimited) (unlimited)
Context切换
有些命令是线程上线文相关的,比如bt,可以用set命令来进行线程上下文切换。
1 切换到指定线程
crash> set ffff8b7e6413c500
PID: 1
COMMAND: "systemd"
TASK: ffff8b7e6413c500 [THREAD_INFO: ffff8b7e6413c500]
CPU: 3
STATE: TASK_INTERRUPTIBLE
crash> bt
PID: 1 TASK: ffff8b7e6413c500 CPU: 3 COMMAND: "systemd"
'#0 [ffffa0f440c6fce0] __schedule at ffffffff9a414ba7
'#1 [ffffa0f440c6fd80] schedule at ffffffff9a41519c
'#2 [ffffa0f440c6fd90] schedule_hrtimeout_range_clock at ffffffff9a419691
'#3 [ffffa0f440c6fe20] schedule_hrtimeout_range at ffffffff9a4196b3
'#4 [ffffa0f440c6fe30] ep_poll at ffffffff99cf8941
'#5 [ffffa0f440c6fee0] do_epoll_wait at ffffffff99cf8ae0
'#6 [ffffa0f440c6ff20] __x64_sys_epoll_wait at ffffffff99cf8b0e
'#7 [ffffa0f440c6ff30] do_syscall_64 at ffffffff99a0428a
'#8 [ffffa0f440c6ff50] entry_SYSCALL_64_after_hwframe at ffffffff9a600088
RIP: 00007ffa791c6bb7 RSP: 00007ffc1c00b9d0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffa791c6bb7
RDX: 00000000000000eb RSI: 00007ffc1c00ba10 RDI: 0000000000000004
RBP: 00007ffc1c00ba10 R8: 0000000000000000 R9: 7465677261742e79
R10: 00000000ffffffff R11: 0000000000000293 R12: 00000000000000eb
R13: 00000000ffffffff R14: 00007ffc1c00ba10 R15: 0000000000000001
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
2 切会panic线程
crash> set -p
PID: 2613
COMMAND: "bash"
TASK: ffff8b7df3cdae00 [THREAD_INFO: ffff8b7df3cdae00]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)
加载module符号表
1 查看当前加载的module
crash> mod
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01b0b40 rc_core 45056 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]
ffffffffc0207580 libahci 32768 (not loaded) [CONFIG_KALLSYMS]
2 加载所有module符号表
crash> mod -S
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 /lib/modules/4.19.53/kernel/drivers/vfio/vfio_iommu_type1.ko
ffffffffc01a4440 uas 24576 /lib/modules/4.19.53/kernel/drivers/usb/storage/uas.ko
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
ffffffffc01e76c0 e1000e 249856 /lib/modules/4.19.53/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
ffffffffc01fcbc0 usbhid 49152 /lib/modules/4.19.53/kernel/drivers/hid/usbhid/usbhid.ko
3 加载指定module符号表
crash> mod -s rc_core /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
MODULE NAME SIZE OBJECT FILE
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
crash> mod
MODULE NAME SIZE OBJECT FILE
ffffffffc019d0c0 vfio_iommu_type1 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01a4440 uas 24576 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01b0b40 rc_core 45056 /lib/modules/4.19.53/kernel/drivers/media/rc/rc-core.ko
ffffffffc01e76c0 e1000e 249856 (not loaded) [CONFIG_KALLSYMS]
ffffffffc01fcbc0 usbhid 49152 (not loaded) [CONFIG_KALLSYMS]
其他命令
还有很多针对某些内核模块的命令,比如kmem,vm,tree,list,pte等等,参考附件命令列表,后面在使用过程中再学习和研究。
命令扩展
crash还支持用户添加在自己的调试命令。可以直接在Crash源码里添加新的命令,更多的是创建一个共享库,用extend动态加载。帮助文档里有一个简单的例子,在crash源码目录下新建一个test.c,把示例代码拷贝进去,就可以进行编译。
gcc -nostartfiles -shared -rdynamic -o echo.so echo.c -fPIC -D<machine-type> $(TARGET_CFLAGS)
crash> sys
KERNEL: ../../kernel-src/linux-4.19.53/vmlinux
DUMPFILE: 201907070732/dump.201907070732 [PARTIAL DUMP]
CPUS: 4
DATE: Sun Jul 7 07:31:34 2019
UPTIME: 00:10:27
LOAD AVERAGE: 0.14, 0.16, 0.12
TASKS: 584
NODENAME: glbian-OptiPlex-990
RELEASE: 4.19.53
VERSION: #1 SMP Sun Jun 23 11:01:25 CST 2019
MACHINE: x86_64 (3292 Mhz)
MEMORY: 7.9 GB
PANIC: "sysrq: SysRq : Trigger a crash"
可以用sys命令查看机器架构,我的及其machine-type选x86-64,编译命令如下:gcc -shared -rdynamic -o test.so test.c -fPIC -Dx86_64 _D_FILE_OFFSET_BITS=64
生成test.so。可以用extend直接加载,加载成功后可以看到帮助菜单多了一条echo命令,我们可以基于echo示例开发自己的命令。
crash> extend ../../src/crash-7.2.6/test.so
../../src/crash-7.2.6/test.so: shared object loaded
crash> extend
SHARED OBJECT COMMANDS
../../src/crash-7.2.6/test.so echo
crash> help
‘* extend mach runq union
alias files mod search vm
ascii foreach mount set vtop
bpf fuser net sig waitq
bt gdb p struct whatis
btop help ps swap wr
dev ipcs pte sym q
dis irq ptob sys
echo kmem ptov task
eval list rd timer
exit log repeat tree
结语
系统崩溃通常是非常棘手的问题,需要非常熟悉内核和相应的子模块,再结合crash工具进行分析,总之需要在实践中累积经验,实践出真知。
附件
Crash命令列表
命令 | 功能 |
---|---|
* | 指针快捷健 |
alias | 命令快捷键 |
ascii | ASCII码转换和码表 |
bpf | eBPF - extended Berkeley Filter |
bt | 堆栈查看 |
btop | 地址页表转换 |
dev | 设备数据查询 |
dis | 返汇编 |
eval | 计算器 |
exit | 退出 |
extend | 命令扩展 |
files | 打开的文件查看 |
foreach | 循环查看 |
fuser | 文件使用者查看 |
gdb | 调用gdb执行命令 |
help | 帮助 |
ipcs | 查看system V IPC工具 |
irq | 查看irq数据 |
kmem | 查看Kernel内存 |
list | 查看链表 |
log | 查看系统消息缓存 |
mach | 查看平台信息 |
mod | 加载符号表 |
mount | Mount文件系统数据 |
net | 网络命令 |
p | 查看数据结构 |
ps | 查看进程状态信息 |
pte | 查看页表 |
ptob | 页表地址转换 |
ptov | 物理地址虚拟地址转换 |
rd | 查看内存 |
repeat | 重复执行 |
runq | 查看run queue上的线程 |
search | 搜索内存 |
set | 设置线程环境和Crash内部变量 |
sig | 查询线程消息 |
struct | 查询结构体 |
swap | 查看swap信息 |
sym | 符号和虚拟地址转换 |
sys | 查看系统信息 |
task | 查看task_struct和thread_thread信息 |
timer | 查看timer队列 |
tree | 查看radix树和rb树 |
union | 查看union结构体 |
vm | 查看虚拟内存 |
vtop | 虚拟地址物理地址转换 |
waitq | 查看wait queue上的进程 |
whatis | 符号表查询 |
wr | 改写内存 |
q | 退出 |