This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] S/390: Fix two issues with the IFUNC optimized mem* routines
On 29/08/12 16:23, H.J. Lu wrote:
> On Wed, Aug 29, 2012 at 4:44 AM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
>> On 29/08/12 13:05, Andreas Jaeger wrote:
>>> On Wednesday, August 29, 2012 12:44:21 Andreas Krebbel wrote:
>>>> Hi,
>>>>
>>>> the attached patch fixes two problems with the S/390 IFUNC
>>>> optimization of the mem* functions:
>>>>
>>>> 1. In the current implementation the resolver functions reside in a
>>>> different file than the CPU optimized versions. This requires an
>>>> R_390_RELATIVE runtime relocation to be generated when the resolver
>>>> returns the function pointers. This caused a bug with GCJ. libgcj
>>>> calls memcpy via function pointer (R_390_GLOB_DAT). This relocation
>>>> is resolved at load time of libgcj. The dynamic linker in that case
>>>> called the memcpy resolver inside Glibc *before* glibc has been
>>>> relocated causing the resolver to return a bogus value.
>>>>
>>>> This perhaps could also be fixed in the dynamic linker by calling the
>>>> ifunc resolvers only in a second pass over all the relocations?!
>>>
>>> Could this also be an issue on other architectures like x86-64? I had a
>>> few strange bugreports with LD_BIND_NOW=1 in kde that were impossible to
>>> debug but seemed to involve multiarch functions,
>>
>> Not for the Glibc functions I think. The resolver functions for x86_64 use lea to load the
>> address of the optimized functions. This works without generating runtime relocations.
>> Another reason is that, according to H.J.Lu, Glibc on x86_64 is always forced to be loaded
>> first so it wouldn't even be a problem if the resolvers would need runtime relocations.
>
> That is not the issue. There are
>
> /* It doesn't make sense to send libc-internal memcpy calls through a PLT.
> The speedup we get from using SSSE3 instruction is likely eaten away
> by the indirect call in the PLT. */
> # define libc_hidden_builtin_def(name) \
> .globl __GI_memcpy; __GI_memcpy = __memcpy_sse2
>
> versioned_symbol (libc, __new_memcpy, memcpy, GLIBC_2_14);
>
>
>> However, I think this is a general problem which might very well occur with other shared
>> objects defining IFUNC optimized routines. Forcing IFUNC resolvers to never generate any
>> runtime relocations to me appears like a rather non-obvious limitation.
>
> There are some limitations. But you can use relative relocations
> with IFUNC symbols if you fix
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=13302
>
>> Please see the following example on x86-64. The example works fine after making a1 static:
>>
>> a.c:
>> #include <stdio.h>
>>
>> void a (int) __attribute__((ifunc ("resolve_a")));
>>
>> void a1 (int i)
>> {
>> printf("%d\n", i + 1);
>> }
>>
>> void (*resolve_a (void)) (int)
>> {
>> return &a1;
>> }
>>
>> b.c:
>> extern void a (int);
>>
>> void (*ap) (int) = a;
>>
>> void
>> b (int i)
>> {
>> ap (i + 1);
>> }
>>
>> main.c:
>> extern void b (int);
>>
>> int
>> main ()
>> {
>> b (1);
>> }
>>
>> gcc -shared -fpic a.c -o liba.so
>> gcc -shared -fpic b.c -o libb.so
>>
>> gcc -o main main.c -L./ -lb -la
>> export LD_LIBRARY_PATH=./
>> $ ./main
>> 3
>>
>> gcc -o main main.c -L./ -la -lb
>> $ ./main
>> Segmentation fault
>>
>
> This is a bug in your testcase.
>
> ---
> void a (int) __attribute__((ifunc ("resolve_a")));
>
> void a1 (int i)
> {
> printf("%d\n", i + 1);
> }
>
> void (*resolve_a (void)) (int)
> {
> return &a1;
> }
> ----
>
> For all I know, "a" may wipe your data at run-time.
Not sure what you mean. Could you please elaborate?
Btw. the same happens if you make a1 resolve locally with a version script:
$ cat linkmap
{
global:
a;
local:
*;
};
$ gcc -shared -fpic a.c -o liba.so -Wl,--version-script,linkmap
$ gcc -o main main.c -L./ -la -lb
$ ./main
Segmentation fault
The point is that if it is not known at compile time that the symbol will resolve locally the compiler generates an GOT
access which for a DSO cannot be completed at final link. So in that case resolve_a requires a runtime relocation:
00000000000005b6 <resolve_a>:
5b6: 55 push %rbp
5b7: 48 89 e5 mov %rsp,%rbp
5ba: 48 8b 05 b7 02 20 00 mov 0x2002b7(%rip),%rax # 200878 <_DYNAMIC+0x188>
5c1: 5d pop %rbp
5c2: c3 retq
5c3: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
5ca: 00 00 00
5cd: 0f 1f 00 nopl (%rax)
Relocation section '.rela.dyn' at offset 0x328 contains 7 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000200878 000000000008 R_X86_64_RELATIVE 590
If like in the example above the relocs in b are processed before the relocs in a the resolver will return an
unrelocated function pointer.
$ LD_DEBUG=reloc ./main
26400: relocation processing: /lib64/libc.so.6
26400: relocation processing: ./libb.so (lazy)
26400: relocation processing: ./liba.so (lazy)
$ ./main
Segmentation fault
If it is done the other way around so that the relocs in the ifunc resolver are resolved first everything works fine.
$ LD_DEBUG=reloc ./main
26417: relocation processing: /lib64/libc.so.6
26417: relocation processing: ./liba.so (lazy)
26417: relocation processing: ./libb.so (lazy)
$ ./main
3
Bye,
-Andreas-