善用GDB 调试一些函数栈被毁坏的问题

最近一些问题的现象一开始难以解释,函数的参数地址在函数内部被传递给另外的函数,然后发现地址发生了改变,这样的情况称之为函数的栈被毁坏,导致无法重入。

然后被调用的函数里面,访问了非法的地址导致了segment fault,产生core dump文件。问题比较棘手

查了一些文件,准备从gdb的栈保护设置开始着手。

1) 编译的时候添加编译选项

-fstack-protector 和 -fstack-protector-all 这两个选项指示编译器开启栈保护,这样在栈乱序的第一时间可以dump出来现场。可加在Makefile里面, 顺便扯一句,Makefile这种东西对于搞开源软件的人,还真是得精通,我随便想写个Makefile玩着,突然感觉自己头脑一片空白。

2) gdb的多线程功能

bt 查看当前线程的调用栈

bt full 查看详细的调用栈

info threads 可以查看所有线程的信息

thread <num> 可以具体跳转到某个线程

f <num> 可以跳转到某个栈中位置

i locals 显示当前调用栈的所有变量

i register 显示当前调用栈的寄存器值,主要是查看地址

有了这些命令的帮助我们可以从core dump 的文件里面分析出很多问题。

下面举个例子:

gdb /lab/testtools/rhel664/dallas/testRelease/R10A06_dynamic_udpport_5/mnsserv/bin/mhlif core-mhlif-18310-1384802382 

(gdb) bt

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

(gdb) bt full

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

No symbol table info available.

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

row = 0x2b31682a433c

answer = LTS_OK

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

No locals.

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

rpsMsg = {msgId = 4501, type = 0 '\000', data = {loadReplayReq = {

fileName = "Ú%\004\000\002\000\000\000\001\000\000\000\235ú\004\000tQ\003\000GP\003\000¸U\000\000Oû\004\000pR\000\000\206ü\004\000\bú\004\000ÅS\000\000vR\000\000\067P\003\000fP\003\000 ü\004\000Úü\004\000¢P\003\000ÿT\000\000\vý\004\000²O\003\000Z\002\002\000Nú\004\000+ú\004\000>ú\004\000\233T\000\000íÿ\001\000ÊT\000\000G\001\002\000M\001\002\000Y\003\002\000£ú\004\000\020ú\004\000\032\000\002\000ÎU\000\000x\000\002\000\035\001\002\000K\002\002\000æù\004\000\206S\000\000\071U\000\000\232ü\004\000õP\003\000ë\000\002\000\202S\003\000Ø\000\002\000xú\004\000\201\001\002\000=T\000\000oR\000\000"..., natType = 48 '0', timeStretch = 11057, 

rpsType = 2156588448}, replayConReq = {msIndex = 271834, contextIndex = 2 '\002', resend = 0 '\000', replayId = 0, 

sessionId = 1, sessionTime = 326301, destIp1 = {addr64 = {932690803249524, 1402216627852728}, 

b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335, 4}, ui = {i1 = 217460, 

i2 = 217159, i3 = 21944, ipv4 = 326479}}, destIp2 = {addr64 = {1403552362680944, 92105573988872}, 

b = "pR\000\000\206ü\004\000\bú\004\000ÅS\000", addr16 = {21104, 0, 64646, 4, 64008, 4, 21445, 0}, ui = {

i1 = 21104, i2 = 326790, i3 = 326152, ipv4 = 21445}}, reqPackets = 21110, timeStretch = 217143, type = 102 'f', 

radiotype = 80 'P', kernelMsId = 933081645382874}, msgQid = {_qId = 0x2000425da}, payloadPropReq = {

payloadPropId = 271834, groupId = 2 '\002', msgLength = 0, userBw = 1}, connectionReq = {msIndex = 271834, 

contextIndex = 2 '\002', payloadPropId = 0, sessionId = 1, addresses = {GiIpAddr = {addr64 = {932690803249524, 

1402216627852728}, b = "tQ\003\000GP\003\000¸U\000\000Oû\004", addr16 = {20852, 3, 20551, 3, 21944, 0, 64335, 

4}, ui = {i1 = 217460, i2 = 217159, i3 = 21944, ipv4 = 326479}}, msPortNo = 21104, GiPortNo = 0}, 

reqPackets = 326152, initiator = 197 'Å', type = 83 'S', radiotype = 0 '\000', kernelMsId = 932622083576438}, 

rpsDeactReq = {msIndex = 271834, contextIndex = 2 '\002', sendMhlResponse = LTS_TRUE, sessionId = {326301, 217460, 

217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874, 217250, 21759, 326923, 

217010, 131674, 326222}, pdpcontextId = 326187, sessionnum = 62 '>'}, moveUpdateDataReq = {msIndex = 271834, 

toDevice = 2 '\002', moveIndex = 1, status = 326301}, suspendResumeReq = {msIndex = 271834, sessionId = {2, 1, 

326301, 217460, 217159, 21944, 326479, 21104, 326790, 326152, 21445, 21110, 217143, 217190, 326688, 326874, 

217250, 21759, 326923, 217010}, sessionnum = 90 'Z', contextIndex = 2 '\002'}, rabCreateReleaseReq = {

msIndex = 271834, contextId = 2 '\002'}, peMoveResp = {msIndex = 271834, toDevice = 2 '\002', moveIndex = 1, 

peIndex = 326301, status = 217460}, scalePayloadReq = {scaleFactor = 271834}, magQid = {_qId = 0x2000425da}}}

count =

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

nowTime = 12394937602

nextTime =

count = 844209533

entry =

pEngine = 0x2b3168518270

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

Rps = {mhlifQId = {_qId = 0x2b315c225000}, magifQId = {_qId = 0x2b319454d000}, initQId = {_qId = 0x2b315c225000}, 

mDeviceNo = 12, mRpsState = RPS_RUNNING_STATE, sessionRepository = {rpsSessionPolymer = {buckets = 100001, 

hash_func = 0x42a5c0 , p_dataRepository = 0x2b316c001070}}, 

log = @0x1538040, apnDev = 10, vpReplayStore = std::vector of length 0, capacity 0, 

mpAlreadyLoaded = std::map with 0 elements}

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

No symbol table info available.

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

一般来说bt full没什么用,但是可以看到一些局部变量的值,但是有些值不可靠,我们还不能准确的定位

(gdb) info threads

16 Thread 0x2b3151cb7100 (LWP 18310)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

15 Thread 0x2b315c54b700 (LWP 18428)  0x00000033834df443 in select () from /lib64/libc.so.6

14 Thread 0x2b315c224700 (LWP 18423)  0x00000033834df443 in select () from /lib64/libc.so.6

13 Thread 0x2b31525e5700 (LWP 18422)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

12 Thread 0x2b3151fb0700 (LWP 18313)  0x00000033834df443 in select () from /lib64/libc.so.6

11 Thread 0x2b3194873700 (LWP 18535)  0x00000033834df443 in select () from /lib64/libc.so.6

10 Thread 0x2b319454c700 (LWP 18534)  0x00000033834df443 in select () from /lib64/libc.so.6

9 Thread 0x2b3194225700 (LWP 18533)  0x00000033834df443 in select () from /lib64/libc.so.6

8 Thread 0x2b3188425700 (LWP 18531)  0x00000033834df443 in select () from /lib64/libc.so.6

7 Thread 0x2b3188200700 (LWP 18530)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

6 Thread 0x2b3178602700 (LWP 18529)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

5 Thread 0x2b3178401700 (LWP 18435)  0x00000033834df443 in select () from /lib64/libc.so.6

4 Thread 0x2b3178200700 (LWP 18434)  0x00000033834df443 in select () from /lib64/libc.so.6

3 Thread 0x2b3169f6b700 (LWP 18433)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

2 Thread 0x2b3169d6a700 (LWP 18432)  0x0000003383c0b44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

* 1 Thread 0x2b3168200700 (LWP 18429)  0x0000003383488611 in memcpy () from /lib64/libc.so.6

(gdb) thread 1

[Switching to thread 1 (Thread 0x2b3168200700 (LWP 18429))]#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

(gdb) bt

#0  0x0000003383488611 in memcpy () from /lib64/libc.so.6

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

#2  0x000000000041b552 in OSH_ReceiveMsgQMillisec (q=0x647580, msg=0x4fc780004fc71 , 

size=280, time=21081) at ltsosdep.c:1370

#3  0x000000000042d47d in RPS::ReceiveMsg (this=0x2b3100005330, delay=21081) at rps.cc:590

#4  0x000000000042d731 in RPS::Execute (this=0x2b31681ffdf0) at rps.cc:572

#5  0x000000000042dbe8 in StartRps (arg=0x157a680) at rps.cc:181

#6  0x0000003383c077e1 in start_thread () from /lib64/libpthread.so.0

#7  0x00000033834e68ed in clone () from /lib64/libc.so.6

(gdb) f 1

#1  0x000000000041a9aa in ReadFromQueue (q=0x647580, msg=0x4fc780004fc71 , size=280, 

time=21081) at ltsosdep.c:443

443     ltsosdep.c: No such file or directory.

in ltsosdep.c

(gdb) i locals

row = 0x2b31682a433c

answer = LTS_OK

(gdb) i register

rax            0x2b0000001197   47278999998871

rbx            0x4fc780004fc71  1403492233444465

rcx            0x7      7

rdx            0x118    280

rsi            0x2b31682a4340   47491200992064

rdi            0x4fc780004fc71  1403492233444465

rbp            0x2b31681ffb80   0x2b31681ffb80

rsp            0x2b31681ffb20   0x2b31681ffb20

r8             0x1c0002000ce527 7881307938678055

r9             0x2b310003517c   47489453609340

r10            0x0      0

r11            0x202    514

r12            0x525a00005259   90546500555353

r13            0x2b3100005330   47489453413168

r14            0x20c49ba5e353f7cf       2361183241434822607

r15            0x2b316c106d70   47491266407792

rip            0x41a9aa 0x41a9aa

eflags         0x10203  [ CF IF RF ]

cs             0x33     51

ss             0x2b     43

ds             0x0      0

es             0x0      0

fs             0x0      0

这里只是演示了一些查看core dump文件的方法,其实在进程alive的时候,我们可以直接attach 到进程上面去分析代码。

(gdb) attach 2467

Attaching to process 2467

Reading symbols from /root/algorithm/testBh...done.

Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.

Loaded symbols for /usr/lib/libstdc++.so.6

Reading symbols from /lib/tls/i686/cmov/libm.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libm-2.11.1.so...done.

done.

Loaded symbols for /lib/tls/i686/cmov/libm.so.6

Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.

Loaded symbols for /lib/libgcc_s.so.1

Reading symbols from /lib/tls/i686/cmov/libc.so.6...Reading symbols from /usr/lib/debug/lib/tls/i686/cmov/libc-2.11.1.so...done.

done.

Loaded symbols for /lib/tls/i686/cmov/libc.so.6

Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so...done.

done.

Loaded symbols for /lib/ld-linux.so.2

0x005f7422 in __kernel_vsyscall ()

(gdb) break testBh.cc:38

Breakpoint 1 at 0x80488ff: file testBh.cc, line 38.

(gdb) c

Continuing.

这些方法可以让进程挂住,然后单步调试,或者print一些局部变量


打印所有线程堆栈

在gdb中使用 thread apply all bt 查看所用线程堆栈信息


最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,186评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,858评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,620评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,888评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,009评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,149评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,204评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,956评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,385评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,698评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,863评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,544评论 4 335
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,185评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,899评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,141评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,684评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,750评论 2 351

推荐阅读更多精彩内容