This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] S/390: Fix two issues with the IFUNC optimized mem* routines


On 29/08/12 16:23, H.J. Lu wrote:
> On Wed, Aug 29, 2012 at 4:44 AM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
>> On 29/08/12 13:05, Andreas Jaeger wrote:
>>> On Wednesday, August 29, 2012 12:44:21 Andreas Krebbel wrote:
>>>> Hi,
>>>>
>>>> the attached patch fixes two problems with the S/390 IFUNC
>>>> optimization of the mem* functions:
>>>>
>>>> 1. In the current implementation the resolver functions reside in a
>>>> different file than the CPU optimized versions.  This requires an
>>>> R_390_RELATIVE runtime relocation to be generated when the resolver
>>>> returns the function pointers. This caused a bug with GCJ. libgcj
>>>> calls memcpy via function pointer (R_390_GLOB_DAT).  This relocation
>>>> is resolved at load time of libgcj.  The dynamic linker in that case
>>>> called the memcpy resolver inside Glibc *before* glibc has been
>>>> relocated causing the resolver to return a bogus value.
>>>>
>>>> This perhaps could also be fixed in the dynamic linker by calling the
>>>> ifunc resolvers only in a second pass over all the relocations?!
>>>
>>> Could this also be an issue on other architectures like x86-64? I had a
>>> few strange bugreports with LD_BIND_NOW=1 in kde that were impossible to
>>> debug but seemed to involve multiarch functions,
>>
>> Not for the Glibc functions I think. The resolver functions for x86_64 use lea to load the
>> address of the optimized functions. This works without generating runtime relocations.
>> Another reason is that, according to H.J.Lu, Glibc on x86_64 is always forced to be loaded
>> first so it wouldn't even be a problem if the resolvers would need runtime relocations.
> 
> That is not the issue.  There are
> 
> /* It doesn't make sense to send libc-internal memcpy calls through a PLT.
>    The speedup we get from using SSSE3 instruction is likely eaten away
>    by the indirect call in the PLT.  */
> # define libc_hidden_builtin_def(name) \
> 	.globl __GI_memcpy; __GI_memcpy = __memcpy_sse2
> 
> versioned_symbol (libc, __new_memcpy, memcpy, GLIBC_2_14);
> 
> 
>> However, I think this is a general problem which might very well occur with other shared
>> objects defining IFUNC optimized routines. Forcing IFUNC resolvers to never generate any
>> runtime relocations to me appears like a rather non-obvious limitation.
> 
> There are some limitations.  But you can use relative relocations
> with IFUNC symbols if you fix
> 
> http://sourceware.org/bugzilla/show_bug.cgi?id=13302
> 
>> Please see the following example on x86-64. The example works fine after making a1 static:
>>
>> a.c:
>> #include <stdio.h>
>>
>> void a (int) __attribute__((ifunc ("resolve_a")));
>>
>> void a1 (int i)
>> {
>>   printf("%d\n", i + 1);
>> }
>>
>> void (*resolve_a (void)) (int)
>> {
>>   return &a1;
>> }
>>
>> b.c:
>> extern void a (int);
>>
>> void (*ap) (int) = a;
>>
>> void
>> b (int i)
>> {
>>   ap (i + 1);
>> }
>>
>> main.c:
>> extern void b (int);
>>
>> int
>> main ()
>> {
>>   b (1);
>> }
>>
>> gcc -shared -fpic a.c -o liba.so
>> gcc -shared -fpic b.c -o libb.so
>>
>> gcc -o main main.c -L./ -lb -la
>> export LD_LIBRARY_PATH=./
>> $ ./main
>> 3
>>
>> gcc -o main main.c -L./ -la -lb
>> $ ./main
>> Segmentation fault
>>
> 
> This is a bug in your testcase.
> 
> ---
> void a (int) __attribute__((ifunc ("resolve_a")));
> 
> void a1 (int i)
> {
>   printf("%d\n", i + 1);
> }
> 
> void (*resolve_a (void)) (int)
> {
>   return &a1;
> }
> ----
> 
> For all I know, "a" may wipe your data at run-time.

Not sure what you mean. Could you please elaborate?

Btw. the same happens if you make a1 resolve locally with a version script:

$ cat linkmap
{
  global:
  a;
  local:
  *;
};
$ gcc -shared -fpic a.c -o liba.so -Wl,--version-script,linkmap
$ gcc -o main main.c -L./ -la -lb
$ ./main
Segmentation fault

The point is that if it is not known at compile time that the symbol will resolve locally the compiler generates an GOT
access which for a DSO cannot be completed at final link. So in that case resolve_a requires a runtime relocation:

00000000000005b6 <resolve_a>:
 5b6:	55                   	push   %rbp
 5b7:	48 89 e5             	mov    %rsp,%rbp
 5ba:	48 8b 05 b7 02 20 00 	mov    0x2002b7(%rip),%rax        # 200878 <_DYNAMIC+0x188>
 5c1:	5d                   	pop    %rbp
 5c2:	c3                   	retq
 5c3:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
 5ca:	00 00 00
 5cd:	0f 1f 00             	nopl   (%rax)

Relocation section '.rela.dyn' at offset 0x328 contains 7 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200878  000000000008 R_X86_64_RELATIVE                    590

If like in the example above the relocs in b are processed before the relocs in a the resolver will return an
unrelocated function pointer.

$ LD_DEBUG=reloc ./main	
     26400:	relocation processing: /lib64/libc.so.6
     26400:	relocation processing: ./libb.so (lazy)
     26400:	relocation processing: ./liba.so (lazy)
$ ./main
Segmentation fault

If it is done the other way around so that the relocs in the ifunc resolver are resolved first everything works fine.

$ LD_DEBUG=reloc ./main
     26417:	relocation processing: /lib64/libc.so.6
     26417:	relocation processing: ./liba.so (lazy)
     26417:	relocation processing: ./libb.so (lazy)
$ ./main
3

Bye,

-Andreas-


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]