前景
在开发后处理工作中往往需要确认编译链接后的文件符号信息,常用的命令包括nm,readelf,objdump。一直都很迷惑这几个命令的区别是什么,今天开个blog研究下。
ELF 文件 和 PE文件
为了更深刻的理解三个命令的作用,我们要从二进制文件格式说起:
- ELF 全称 “Executable and Linkable Format”,是UNIX类操作系统中普遍采用的目标文件格式(".o",".so"," .out")。 PS. 静态库实质上是目标文件(.o)的打包,没有对目标文件内容做任何修改(可以理解成tar)。
- PE 全称“Portable Executable",是Windows操作系统所采用的目标文件格式;
ELF和PE都源自于COFF文件格式,算是两兄弟吧。本文重点关注ELF文件格式。
ELF 文件格式
通过以下命令可以查看elf的结构介绍
$ man elf
LF(5) Linux Programmer's Manual ELF(5)
NAME
elf - format of Executable and Linking Format (ELF) files
SYNOPSIS
#include <elf.h>
DESCRIPTION
The header file <elf.h> defines the format of ELF executable binary files. Amongst these files are normal executable files, relocatable object
files, core files, and shared objects.
An executable file using the ELF file format consists of an ELF header, followed by a program header table or a section header table, or both. The
ELF header is always at offset zero of the file. The program header table and the section header table's offset in the file are defined in the ELF
header. The two tables describe the rest of the particularities of the file.
This header file describes the above mentioned headers as C structures and also includes structures for dynamic sections, relocation sections and
symbol tables.
......
......
从我个人理解,ELF可以理解为一本书(或者说档案),总体可分为两个部分:描述信息(书的目录和序)以及实质信息(书的实质内容)。
描述信息
- ELF Header:以Magic Str开头,描述文件类型,abi, version等信息,并提供到Program Header Table*和 Section Header Table的索引。
- Program Header Table: 存在于可执行文件或共享库中,是一个结构体组成的数组。一般在ELF Header之后。每个数组元素描述一个段(segment)或者其他程序执行时需要的信息;该结构用于执行阶段。
- Section Header Table: 索引每个section的位置,该结构用于链接阶段。
一个典型的ELF Header信息如下所示:
$ readelf -h /usr/bin/touch
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x402620
Start of program headers: 64 (bytes into file)
Start of section headers: 62576 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 29
Section header string table index: 28
实质信息——段(segment)和节(section)的联系
- Sections:可能包含代码段,data信息,符号信息等等;
- Segments: 由0或多个sections组成;分为code segments 和data segments。在链接阶段时,链接器会将具有相同属性的sections放在一起形成segment。
考虑如下代码:
// reference from https://web.archive.org/web/20171219120315/http://nairobi-embedded.org:80/040_elf_sec_seg_vma_mappings.html
$ cat sec_seg_vma.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
static uint32_t bss[1024]; /* include 4KB in ".bss" */
static uint32_t data[1024] = {1024}; /* include 4KB in ".data" */
static const int8_t rodata[8192] = {"rodata"}; /* include 8KB in ".rodata" */
int main(void)
{
char cmd[64];
printf("0x%lx\n", (unsigned long)malloc(8192));
sprintf(cmd, "cat /proc/%d/maps", getpid());
system(cmd);
exit(EXIT_SUCCESS);
}
.bss .data .rodata分别是三个section。.bss .data包含在data segment中,.rodata包含在code segment中。
墙裂推荐移步以下链接读一下更详尽的解释
- https://web.archive.org/web/20171219120315/http://nairobi-embedded.org:80/040_elf_sec_seg_vma_mappings.html#fn4
- https://www.entry0.cn/2020/04/16/274.html
nm Command
nm命令用于展示object 文件,可执行文件,共享库中的符号信息。其用法如下:
nm [ -A ] [ -C ] [ -X {32|64|32_64|d64| any}] [ -f ] [ -h ] [ -l ] [ -p ] [ -r ] [ -T ] [ -v ] [ -B | -P ] [ -e | -g | -u ] [ -d | -o | -x | -t Format ] File ...
nm各个option的具体含义可以使用 nm -h进行查看。
$ nm -help
Usage: nm [option(s)] [file(s)]
List symbols in [file(s)] (a.out by default).
The options are:
-a, --debug-syms Display debugger-only symbols
-A, --print-file-name Print name of the input file before every symbol
-B Same as --format=bsd
-C, --demangle[=STYLE] Decode low-level symbol names into user-level names
The STYLE, if specified, can be `auto' (the default),
`gnu', `lucid', `arm', `hp', `edg', `gnu-v3', `java'
or `gnat'
--no-demangle Do not demangle low-level symbol names
-D, --dynamic Display dynamic symbols instead of normal symbols
--defined-only Display only defined symbols
-e (ignored)
-f, --format=FORMAT Use the output format FORMAT. FORMAT can be `bsd',
`sysv' or `posix'. The default is `bsd'
-g, --extern-only Display only external symbols
-l, --line-numbers Use debugging information to find a filename and
line number for each symbol
-n, --numeric-sort Sort symbols numerically by address
-o Same as -A
-p, --no-sort Do not sort the symbols
-P, --portability Same as --format=posix
-r, --reverse-sort Reverse the sense of the sort
--plugin NAME Load the specified plug-in
-S, --print-size Print size of defined symbols
-s, --print-armap Include index for symbols from archive members
--size-sort Sort symbols by size
--special-syms Include special symbols in the output
--synthetic Display synthetic symbols as well
-t, --radix=RADIX Use RADIX for printing symbol values
--target=BFDNAME Specify the target object format as BFDNAME
-u, --undefined-only Display only undefined symbols
-X 32_64 (ignored)
@FILE Read options from FILE
-h, --help Display this information
-V, --version Display this program's version number
nm: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 a.out-i386-linux pei-i386 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-x86-64 pe-bigobj-x86-64 pe-i386 plugin srec symbolsrec verilog tekhex binary ihex
Report bugs to <http://www.sourceware.org/bugzilla/>.
由help输出可以看到nm支持的ELF格式。
reference link: https://www.ibm.com/docs/en/aix/7.2?topic=n-nm-command
示例
nm -A a.out |grep symbols-you-concerned
readelf Command
该命令可用于展示elf文件的各类信息,不局限于符号信息。
$ readelf -h
Usage: readelf <option(s)> elf-file(s)
Display information about the contents of ELF format files
Options are:
-a --all Equivalent to: -h -l -S -s -r -d -V -A -I
-h --file-header Display the ELF file header
-l --program-headers Display the program headers
--segments An alias for --program-headers
-S --section-headers Display the sections' header
--sections An alias for --section-headers
-g --section-groups Display the section groups
-t --section-details Display the section details
-e --headers Equivalent to: -h -l -S
-s --syms Display the symbol table
--symbols An alias for --syms
--dyn-syms Display the dynamic symbol table
-n --notes Display the core notes (if present)
-r --relocs Display the relocations (if present)
-u --unwind Display the unwind info (if present)
-d --dynamic Display the dynamic section (if present)
-V --version-info Display the version sections (if present)
-A --arch-specific Display architecture specific information (if any)
-c --archive-index Display the symbol/file index in an archive
-D --use-dynamic Use the dynamic section info when displaying symbols
-x --hex-dump=<number|name>
Dump the contents of section <number|name> as bytes
-p --string-dump=<number|name>
Dump the contents of section <number|name> as strings
-R --relocated-dump=<number|name>
Dump the contents of section <number|name> as relocated bytes
-z --decompress Decompress section before dumping it
-w[lLiaprmfFsoRt] or
--debug-dump[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=frames-interp,=str,=loc,=Ranges,=pubtypes,
=gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
=addr,=cu_index]
Display the contents of DWARF2 debug sections
--dwarf-depth=N Do not display DIEs at depth N or greater
--dwarf-start=N Display DIEs starting with N, at the same depth
or deeper
-I --histogram Display histogram of bucket list lengths
-W --wide Allow output width to exceed 80 characters
@<file> Read options from <file>
-H --help Display this information
-v --version Display the version number of readelf
示例
readelf -s a.out|grep symbols-you-concerned
objdump Command
objdump 命令可读取目标文件的各类信息,相较于readelf,不仅仅局限于ELF文件格式。比如在windows系统中,该命令拥有更广泛的用途。根据前文可知,windows目标文件采用PE文件格式,而非ELF。
Usage: objdump <option(s)> <file(s)>
Display information from object <file(s)>.
At least one of the following switches must be given:
-a, --archive-headers Display archive header information
-f, --file-headers Display the contents of the overall file header
-p, --private-headers Display object format specific file header contents
-P, --private=OPT,OPT... Display object format specific contents
-h, --[section-]headers Display the contents of the section headers
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
-S, --source Intermix source code with disassembly
-s, --full-contents Display the full contents of all sections requested
-g, --debugging Display debug information in object file
-e, --debugging-tags Display debug information using ctags style
-G, --stabs Display (in raw form) any STABS info in the file
-W[lLiaprmfFsoRt] or
--dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,
=frames-interp,=str,=loc,=Ranges,=pubtypes,
=gdb_index,=trace_info,=trace_abbrev,=trace_aranges,
=addr,=cu_index]
Display DWARF info in the file
-t, --syms Display the contents of the symbol table(s)
-T, --dynamic-syms Display the contents of the dynamic symbol table
-r, --reloc Display the relocation entries in the file
-R, --dynamic-reloc Display the dynamic relocation entries in the file
@<file> Read options from <file>
-v, --version Display this program's version number
-i, --info List object formats and architectures supported
-H, --help Display this information
The following switches are optional:
-b, --target=BFDNAME Specify the target object format as BFDNAME
-m, --architecture=MACHINE Specify the target architecture as MACHINE
-j, --section=NAME Only display information for section NAME
-M, --disassembler-options=OPT Pass text OPT on to the disassembler
-EB --endian=big Assume big endian format when disassembling
-EL --endian=little Assume little endian format when disassembling
--file-start-context Include context from start of file (with -S)
-I, --include=DIR Add DIR to search list for source files
-l, --line-numbers Include line numbers and filenames in output
-F, --file-offsets Include file offsets when displaying information
-C, --demangle[=STYLE] Decode mangled/processed symbol names
The STYLE, if specified, can be `auto', `gnu',
`lucid', `arm', `hp', `edg', `gnu-v3', `java'
or `gnat'
-w, --wide Format output for more than 80 columns
-z, --disassemble-zeroes Do not skip blocks of zeroes when disassembling
--start-address=ADDR Only process data whose address is >= ADDR
--stop-address=ADDR Only process data whose address is <= ADDR
--prefix-addresses Print complete address alongside disassembly
--[no-]show-raw-insn Display hex alongside symbolic disassembly
--insn-width=WIDTH Display WIDTH bytes on a single line for -d
--adjust-vma=OFFSET Add OFFSET to all displayed section addresses
--special-syms Include special symbols in symbol dumps
--prefix=PREFIX Add PREFIX to absolute paths for -S
--prefix-strip=LEVEL Strip initial directory names for -S
--dwarf-depth=N Do not display DIEs at depth N or greater
--dwarf-start=N Display DIEs starting with N, at the same depth
or deeper
--dwarf-check Make additional dwarf internal consistency checks.
objdump: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 a.out-i386-linux pei-i386 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-x86-64 pe-bigobj-x86-64 pe-i386 plugin srec symbolsrec verilog tekhex binary ihex
objdump: supported architectures: i386 i386:x86-64 i386:x64-32 i8086 i386:intel i386:x86-64:intel i386:x64-32:intel i386:nacl i386:x86-64:nacl i386:x64-32:nacl iamcu iamcu:intel l1om l1om:intel k1om k1om:intel plugin
The following i386/x86-64 specific disassembler options are supported for use
with the -M switch (multiple options should be separated by commas):
x86-64 Disassemble in 64bit mode
i386 Disassemble in 32bit mode
i8086 Disassemble in 16bit mode
att Display instruction in AT&T syntax
intel Display instruction in Intel syntax
att-mnemonic
Display instruction in AT&T mnemonic
intel-mnemonic
Display instruction in Intel mnemonic
addr64 Assume 64bit address size
addr32 Assume 32bit address size
addr16 Assume 16bit address size
data32 Assume 32bit data size
data16 Assume 16bit data size
suffix Always display instruction suffix in AT&T syntax
amd64 Display instruction in AMD64 ISA
intel64 Display instruction in Intel64 ISA
Report bugs to <http://www.sourceware.org/bugzilla/>.
示例
objdump -t a.out|grep syms-you-concerned
以上