Find C++ Thiscall in Binary


Backgrond:

  • calling convention defines what register will be used to pass argument from caller to callee
  • how stack is used during function call
    • thiscall is a call to class's member function, because it need to pass object pointer(this->...) to callee. This is implement differently in different system
    • focus on linux x64/gcc ELF
    • OOAnalyzer is tool (using prolog with tabling) can recover detail abstraction like member function/data structure/virtual fuction dispatch. But it only work for Win32 program.
    • ddisasm is very precise datalog disasmbler which can disassemble x64-ELF binary. We can easily get instruction datalog fact and their intermediate analysis result .csv file

Main Goal:
recover thiscall infomation using fact generated by ddisasm. So we can recover the structure of a class later using concept from OOAnalyzer.

Problem:

  • OOAnalyzer only work for win32 because the calling convention of win32 and x64-elf is very different, it's logic rule based on thiscall facts.
  • ddisasm's internal analysis will not care about thiscall, because it is a more general disassembler not just for linux/C++ program

Thiscall Calling Convention

  • in Win32, this pointer will be passed into ecx register by caller. And callee will take out ecx if it use this pointer
  • in linux32, thiscall is similar to normal call --- push argument into stack. But one difference: this pointer will be inserted before first argument on stack.
  • in linux64, calling convention is different. For integer size things(which is important in our discuss, because pointer is also integer), the first free register among %rsi, %rdi, %rcx, %rdx, %r8, %r9 will be used to pass argument, if more than 6 argument, other will be on stack. Similar to linux 32, first argument will become this pointer, which means it's always in %rdi register.
thiscall:
t.getX() 
->
            add EAX,EAX
            mov DWORD PTR [RBP-32],EAX
            lea RAX,QWORD PTR [RBP-16]
            mov RDI,RAX
            call _ZN5Tuple4getYEv

normal call:
sqsum(nosense1, nosense2)
->
            mov EDX,EAX
            mov EAX,EDX
            add EAX,EAX
            add EAX,EDX
            mov DWORD PTR [RBP-28],EAX
            mov EDX,DWORD PTR [RBP-28]
            mov EAX,DWORD PTR [RBP-32]
            mov ESI,EDX
            mov EDI,EAX
            call _Z5sqsumii

What Might be Useful

current ddiasm has a def-use analysis, can find the register value in different line. But it's not very precise, it only care about used address

which register definitions are potentially used to compute address to access memory.

in this case, thiscall will not be detected in their register value analysis, because

  1. it is intraprocedure analysis, this pointer is used in callee
  2. it is somehow a logically implicit usage of %rdi, becasue in c++ language level, we call function like foo.bar() we need object pointer to do function call.

I think we can based on their work but also try to propgate all possible %rdi value, not just address might be used.

  • mark all pointer look like a struct (pattern like lea ... [base+...])
  • and then check if that address is passed into %rdi before call.
  • If so we can guess that pointer is a object pointer and %rdi is a thiscall.

Is there some way to more precise identify object pointer?

Reference

https://edmcman.github.io/papers/ccs18.pdf

https://www.usenix.org/system/files/sec20fall_flores-montoya_prepub_0.pdf

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。