About IFunc

[
** Updated 02/03/2015 **
The patch to implement IFUNC for arm is submitted here - https://sourceware.org/ml/binutils/2015-01/msg00258.html
]

Scenario

A nasty bug happens in the IFUNC implementation, so write down what I understand for IFUNC for future reference.

IFunc is nothing advanced, it is merely a trick to choose, usually depending on cpu features, a certain function implementation version, the decision is not made before every function invocation, but just once right before binary execution.

A typical usage would be to select one of the following memcpy implementation for a certain hardware.

  • memcpy_neon() …
  • memcpy_vfp() …
  • memcpy_generic_arm() …

The naive way

<pre><code>
void* memcpy(source, dest, size)
{
cpu_features = get_cpu_feture();
if (cpu_has_neon(cpu_features))
return memcpy_neon(source, dest, size);
else if(cpu_has_vfp(cpu_features))
return memcpy_vfp(source, dest, size);
return memcpy_generic_arm(source, dest, size);
}
</code></pre>

Which apparently incurs big performance penalty, the same logic executes for every memcpy invocation.

The ifunc way

IFunc comes in rescue for this scenario - defines a memcpy resolve function, instead of doing actual work, returning a function pointer, depending on a certain logic, in which the actual work will be done. Mark memcpy as a ifunc with resolver set to the aforementioned “memcpy resolver” like below.

<pre><code>
void *memcpy (void *, const void *, size_t)
attribute ((ifunc ("resolve_memcpy")));

// Returns a function pointer
static void (resolve_memcpy (void)) (void)
{
cpu_features = xx; /
for arm, r0 is preset to the the cpu feature value. */

if (cpu_has_neon(cpu_features))
return &memcpy_neon;
else if(cpu_has_vfp(cpu_features))
return &memcpy_vfp;
return &memcpy_generic_arm;
}
</code></pre>

The big difference from “the naive way” is that resolve_memcpy is guaranteed to be called only and exactly once, and that is before main execution (usually in __start).

Implementation

Compiler side

Whenever seeing a “__attribute((ifunc(...))”, mark the function symbol as “IFUNC” in the symbol table, that’s it, simple enough.

Static linker side

[
** Updated 02/03/2015 ** – notice, arm and aarch64 has some slightly different implementation here. For aarch64, the resolve function address is encoded in addend field of a relocation, while for arm, the address is written into the got entry.

<pre><code>
// This is aarch64 implementation - aarch64/dl-irel.h
if (__glibc_likely (r_type == R_AARCH64_IRELATIVE))
{
// the resolve function address is encoded in addend field.
ElfW(Addr) value = elf_ifunc_invoke (reloc->r_addend);
*reloc_addr = value;
}

// This is arm implementation – arm/dl-irel.h
if (__builtin_expect (r_type == R_ARM_IRELATIVE, 1))
{
// the resolve function address in written into the relocation address (the got entry)
Elf32_Addr value = elf_ifunc_invoke (*reloc_addr);
*reloc_addr = value;
}
</code></pre>

This example is based on arm implementation.
]

Whenever seeing a call to an ifunc, the linker does these 3 things -

  • make this call via plt
  • set the corresponding plt.got entry to the address of the resolver function.
  • attach a IRELATIVE to the plt entry.

For example -
<pre><code>
memcpy_pltentry:
0 add r12, pc, #4
4 add r12, r12, #0
8 ldr pc, [r12, #0] // transfer pc to 2000, the content of [12]

memcpy_gotentry:
12 2000 // Attach an IRELATIVE relocation here.

a_routine:
1000 b 0 // call memcpy via plt,
// 0 is the address of memcpy_pltentry
...

memcpy_resolver:
2000 mov r0, 3000
bx lr

memcpy_neon:
3000 ...

memcpy_vfp:
4000 ...

memcpy_generic_arm:
5000 ...
</code></pre>

Right before executing main

glibc will iterative all IRELATIVE relocations, for each such relocation it

  • loads content from IRELATIVE address
  • sets this content to PC (basically this runs the function, whose address is stored in the address denoted by the IRELATIVE relocation) For the example above, IRELATIVE address is 12, its content is 2000, so set pc to 2000, which is memcpy_resolver
  • re-writes the IRELATIVE memory address with the return value from above function invocation. For the example above, the IRELATIVE address is 12 and the return value is 3000, so write 3000 to 12

All later invocation to memcpy goes to memcpy_neon, and memcpy_resolver will ** never be called again**.

After step 1,2, the above memory layout becomes -

<pre><code>
memcpy_pltentry:
0 add r12, pc, #4
4 add r12, r12, #0
8 ldr pc, [r12, #0] // transfer pc to 3000 now,
// the content of [12]

memcpy_gotentry:
12 3000 // 3000 is the value returned by memcpy_resolver.

a_routine:
1000 b 0 // call memcpy via plt,
// 0 is the address of memcpy_pltentry
...

memcpy_resolver:
2000 mov r0, 3000
bx lr

memcpy_neon:
3000 ...

memcpy_vfp:
4000 ...

memcpy_generic_arm:
5000 ...
</code></pre>

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,878评论 0 23
  • (一) 我想了很多个我们重逢的情景,却发现我们没有重逢的机会。——题记 我们总在得知结果以后埋怨不够勇敢的自己,就...
    一只行走的板栗阅读 261评论 0 0
  • 看完了一部英剧,还是想很有格调的来一个标题,这个题目似乎也还不错的样子。其实,每次连续地看美剧看英剧,我都忍不住用...
    whayou阅读 284评论 0 1
  • 吉英·班纳特 班府上全家人这一个晚上大致都过得很高兴。大小姐蒙彬格莱先生邀她 跳了两次舞,而且这位贵人的姐妹们都对...
    初澜阅读 302评论 0 0
  • 如果没有遇见您 “如果当时没有遇见您,现在我们会在哪里?’ 这一句话我经常问我的先生,如果4年前我们没有遇见,现在...
    DISC胡晓敏阅读 315评论 1 0