使用 Kernel Oops Analyzer工具分析Kernel Oops

什么是kernel oops?

"Oops"是”Out of Print Statement“的缩写,也被称为kernel panic。它是Linux内核在执行期间遇到问题时发出的一种错误消息。
当内核遇到无法处理的异常情况时,它会停止执行并输出Oops消息,以帮助开发人员诊断和解决问题。
在Linux系统中,Oops通常由硬件故障、驱动程序错误、内存管理问题或其他异常情况引起。
当Oops发生时,系统将停止响应,并且必须进行调试和修复才能继续运行。

Kernel Oops Analyzer

Kernel Oops Analyzer是renhat开发的一个在线分析oops的工具,Kernel Oops Analyzer 工具通过将 oops 消息与知识库中已知问题进行比较,分析崩溃转储。

举例说明

首先确认OS生成了vmcore-dmesg文件,文件中并包含了oops消息,如下:

image.png
[ 2025.570010] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c2
[ 2025.570043] PGD 0 P4D 0 
[ 2025.570054] Oops: 0002 [#1] SMP NOPTI
[ 2025.570069] CPU: 6 PID: 10250 Comm: reboot Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[ 2025.570106] Hardware name: Lenovo ThinkSystem SR650 -[7X06CTO1WW]-/-[7X06CTO1WW]-, BIOS -[IVE180H-3.41]- 10/05/2022
[ 2025.570136] RIP: 0010:i40e_shutdown+0x11/0x120 [i40e]
[ 2025.570169] Code: 07 74 0b 48 83 c0 08 48 39 d0 75 e6 5b c3 5b e9 25 fd ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 fd 53 48 8b 9f 70 01 00 00 <f0> 80 8b c2 06 00 00 04 f0 80 8b c0 06 00 00 08 48 8d bb 10 08 00
[ 2025.570223] RSP: 0018:ffffb3dc094a7d90 EFLAGS: 00010282
[ 2025.570241] RAX: ffffffffc05b8620 RBX: 0000000000000000 RCX: 0000000000000000
[ 2025.570263] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff94a0858cd000
[ 2025.570285] RBP: ffff94a0858cd000 R08: ffffffffffffffff R09: ffffffffb6f7a180
[ 2025.570306] R10: 0000000000000001 R11: 0000000000000003 R12: ffff94a0858cd000
[ 2025.570328] R13: ffffffffb5b540fe R14: ffff94a0858cd138 R15: 0000000000000000
[ 2025.570349] FS:  00007fc1d974a980(0000) GS:ffff94a03fb80000(0000) knlGS:0000000000000000
[ 2025.570375] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2025.570393] CR2: 00000000000006c2 CR3: 0000000162086002 CR4: 00000000007706e0
[ 2025.570415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2025.570436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2025.570458] PKRU: 55555554
[ 2025.570468] Call Trace:
[ 2025.570481]  pci_device_shutdown+0x34/0x60
[ 2025.570499]  device_shutdown+0x165/0x1c5
[ 2025.570516]  kernel_restart+0xe/0x30
[ 2025.570533]  __do_sys_reboot+0x1d2/0x210
[ 2025.570547]  ? __switch_to_asm+0x35/0x70
[ 2025.570564]  ? __switch_to_asm+0x41/0x70
[ 2025.570578]  ? __switch_to_asm+0x35/0x70
[ 2025.570592]  ? __switch_to_asm+0x41/0x70
[ 2025.570606]  ? __switch_to_asm+0x35/0x70
[ 2025.570619]  ? __switch_to_asm+0x41/0x70
[ 2025.570633]  ? __switch_to_asm+0x35/0x70
[ 2025.570647]  ? __switch_to_asm+0x41/0x70
[ 2025.570661]  ? __switch_to_asm+0x35/0x70
[ 2025.570675]  ? __switch_to_asm+0x41/0x70
[ 2025.570689]  ? __switch_to_asm+0x35/0x70
[ 2025.570703]  ? __switch_to_asm+0x41/0x70
[ 2025.570716]  ? __switch_to_asm+0x35/0x70
[ 2025.570730]  ? __switch_to_asm+0x41/0x70
[ 2025.570744]  ? __switch_to+0x10c/0x450
[ 2025.570760]  ? syscall_trace_enter+0x1fb/0x2c0
[ 2025.570778]  do_syscall_64+0x5b/0x1a0
[ 2025.571314]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 2025.571830] RIP: 0033:0x7fc1d89c34b7
[ 2025.572334] Code: 01 b8 ff ff ff ff eb c2 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 a1 79 29 00 f7 d8 64 89 02 b8
[ 2025.573399] RSP: 002b:00007ffd9e0cf698 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
[ 2025.573937] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc1d89c34b7
[ 2025.574472] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 2025.575011] RBP: 00007ffd9e0cf6e0 R08: 0000000000000002 R09: 0000000000000000
[ 2025.575538] R10: 000000000000004b R11: 0000000000000246 R12: 0000000000000001
[ 2025.576047] R13: 00000000fffffffe R14: 0000000000000006 R15: 0000000000000000
[ 2025.576535] Modules linked in: binfmt_misc bonding tls resguard_linux(OE) secmodel_linux(OE) syshook_linux(OE) xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc sunrpc vfat fat sddlmfdrv(POE) sddlmadrv(POE) intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul ghash_clmulni_intel rapl ses intel_cstate enclosure scsi_transport_sas intel_uncore pcspkr joydev ioatdma st mei_me ch mei i2c_i801 ipmi_ssif dca lpc_ich wmi ipmi_si acpi_power_meter acpi_pad xfs libcrc32c sd_mod sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops qla2xxx drm nvme_fc i40e megaraid_sas nvme_fabrics tg3 crc32c_intel nvme_core ahci libahci t10_pi libata scsi_transport_fc i2c_algo_bit dm_mirror dm_region_hash dm_log
[ 2025.576615]  dm_mod ipmi_devintf ipmi_msghandler fuse
[ 2025.580737] CR2: 00000000000006c2

访问内核 Oops 分析器工具

登录网站:https://access.redhat.com/labs/kerneloopsanalyzer/

要诊断内核崩溃问题,请上传 vmcore 中生成的内核oops 日志。

点 DETECT,基于 makedumpfile 中的信息与已知解决方案比较 oops 消息。

image.png
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容