Backgrond:
- calling convention defines what register will be used to pass argument from caller to callee
- how stack is used during function call
-
thiscall
is a call to class's member function, because it need to pass object pointer(this->...) to callee. This is implement differently in different system - focus on linux x64/gcc ELF
- OOAnalyzer is tool (using prolog with tabling) can recover detail abstraction like member function/data structure/virtual fuction dispatch. But it only work for Win32 program.
- ddisasm is very precise datalog disasmbler which can disassemble x64-ELF binary. We can easily get instruction datalog fact and their intermediate analysis result .csv file
-
Main Goal:
recover thiscall
infomation using fact generated by ddisasm. So we can recover the structure of a class later using concept from OOAnalyzer.
Problem:
- OOAnalyzer only work for win32 because the calling convention of win32 and x64-elf is very different, it's logic rule based on
thiscall
facts. - ddisasm's internal analysis will not care about thiscall, because it is a more general disassembler not just for linux/C++ program
Thiscall Calling Convention
- in Win32, this pointer will be passed into ecx register by caller. And callee will take out ecx if it use
this
pointer - in linux32, thiscall is similar to normal call --- push argument into stack. But one difference: this pointer will be inserted before first argument on stack.
- in linux64, calling convention is different. For integer size things(which is important in our discuss, because pointer is also integer), the first free register among %rsi, %rdi, %rcx, %rdx, %r8, %r9 will be used to pass argument, if more than 6 argument, other will be on stack. Similar to linux 32, first argument will become this pointer, which means it's always in %rdi register.
thiscall:
t.getX()
->
add EAX,EAX
mov DWORD PTR [RBP-32],EAX
lea RAX,QWORD PTR [RBP-16]
mov RDI,RAX
call _ZN5Tuple4getYEv
normal call:
sqsum(nosense1, nosense2)
->
mov EDX,EAX
mov EAX,EDX
add EAX,EAX
add EAX,EDX
mov DWORD PTR [RBP-28],EAX
mov EDX,DWORD PTR [RBP-28]
mov EAX,DWORD PTR [RBP-32]
mov ESI,EDX
mov EDI,EAX
call _Z5sqsumii
What Might be Useful
current ddiasm has a def-use analysis, can find the register value in different line. But it's not very precise, it only care about used address
which register definitions are potentially used to compute address to access memory.
in this case, thiscall will not be detected in their register value analysis, because
- it is intraprocedure analysis, this pointer is used in callee
- it is somehow a logically implicit usage of
%rdi
, becasue in c++ language level, we call function likefoo.bar()
we need object pointer to do function call.
I think we can based on their work but also try to propgate all possible %rdi value, not just address might be used.
- mark all pointer look like a struct (pattern like lea ... [base+...])
- and then check if that address is passed into %rdi before call.
- If so we can guess that pointer is a object pointer and %rdi is a thiscall.
Is there some way to more precise identify object pointer?
Reference
https://edmcman.github.io/papers/ccs18.pdf
https://www.usenix.org/system/files/sec20fall_flores-montoya_prepub_0.pdf