This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Mips}Using DT tags for handling local ifuncs


Jack Carter <Jack.Carter@imgtec.com> writes:
> On 12/19/2013 04:35 PM, Richard Sandiford wrote:> Jack Carter
> <Jack.Carter@imgtec.com> writes:
>>>>> I also have a hard time with how the GOT is used for binutils. In my
>>>>> experience and world view, sections have attributes that make them gp
>>>>> relative or not. All these sections get gathered in gp relative
>>>>> regions that are 64k from a value that will be in their $GP. If there
>>>>> are GOT elements that are not gp relative, they should be in another
>>>>> .got that is not marked SHF_MIPS_GPREL. It will not get laid out and
>>>>> calibrated with any of the other GOTs.  Other sections in my life that
>>>>> get bundled up in the equation for multigot are .sbss, .sdata,
>>>>> .lit[4,8,16], .srdata, but only if they are marked SHF_MIPS_GPREL.
>>>>
>>>> Just so I understand, do you think that the ABI GOT should always be 64k
>>>> or smaller?  I.e. DT_MIPS_LOCAL_GOTNO + (DT_MIPS_SYMTABNO - DT_MIPS_GOTSYM)
>>>> should be <= 64 * 1024 / sizeof (void *)?  If so, what should happen
>>>> (under the original or IRIX n32/n64 ABIs) if the number of symbols
>>>> involved in .rel.dyn relocations exceeds the 64k limit?  Is that a
>>>> link error?
>>>>
>>> Yes, because in sgi's case you count all the SHF_MIPS_GPREL sections as
>>> the GP area. .got is only one of them and sgi just put gp-relative
>>> entries in it.
>> 
>> But why then do you think the R_MIPS_GOTHI16/R_MIPS_GOTLO16 relocs
>> and R_MIPS_CALLHI16/R_MIPS_CALLLO16 relocs were defined?  (They were
>> part of the original ABI.)  If the intention really was to limit the
>> ABI GOT to 64k I don't think these "xgot" relocs would be needed.
>
> I believe, remember this is religious, that it was the first attempt
> to solve the large GP region problem. If we had our multi-got working
> I don't think xgot would have seen the light of day. Multi-got was
> invisible to the general user and had no runtime down sides beyond
> more support sections and startup explicit relocations.

OK, I thought it might be something like that.  But IIRC the SGI tools
did continue to support xgot alongside multigot (including for o32,
which IIRC didn't have multigot retrofitted, at least not on the
version of IRIX we were using).  The GNU tools support both too.

I don't think it makes sense to say that a GOT entry must be within the
64k region if all GOT accesses use a {GOT,CALL}HI16/LO16 pair, since it
defeats the point of having the HI16 and LO16 relocs.  And IMO that
includes the specific case of "all" evaluating to zero, i.e. those cases
where we only have a GOT entry for the sake of dynamic R_MIPS_REL32 relocs.

>>>>> The DT_MIPS_LOCAL_GOTNO describes local got entries. Not other
>>>>> partitions that we reserve the right to put non-local got entries.
>>>>
>>>> I'm still not sure which part you're describing as the local GOT here.
>>>> Let's go back to the original 32-bit GOT layout, without any GNU extensions:
>>>>
>>>>        +------------+   +    <--- DT_PLTGOT
>>>>        |   entry 0  |   |
>>>>        +------------+ + B
>>>>        |  ........  | A |
>>>>        +------------+ + +    <--- DT_PLTGOT + DT_MIPS_LOCAL_GOTNO * 4
>>>>        | Global GOT |
>>>>        +------------+
>>>>
>>>> where:
>>>>
>>>>    The zero entry in the global offset table is reserved to hold the
>>>>    address of the entry point in the dynamic linker to call when lazy
>>>>    resolving text symbols. The dynamic linker must always initialize this
>>>>    entry regardless of whether lazy binding is or is not enabled.
>>>>
>>>> Do you see the local GOT as being A or B?  I.e. does it include
>>>> the zero entry?
>>>
>>> It is by definition A and B,
>
> Entry[0] is a cheat, mistake, act of carelessness in my humble
> opinion. Not what it does, but the fact that it was allowed to be part
> of the local got region.  It should have been explicitly pointed to in
> the DT table and the local region start pointed to.
>
> It is an oversight and an exception that the lawyers can use to
> further encroach on the local got region.
>
> In my view of the object format world, dynamic areas need to be
> explicitly called out. This is a dangerous region to be working on
> heuristics and exceptions. Currently ld.so assumes the local region is
> DT_MIPS_LOCAL_GOTNO long and starts at PLTGOT.  But wait, we have a
> special entry so we up the loop counter and maybe we will discover
> that there is another exception for slot #2 and we up the counter
> again. Then and only then we run the loop to fix up locals.

Right.  And IMO this means that the local area is effectively A, since
like you say A is the only part that gets relocated as a local area
and is the only part that contains local addresses.  So I don't think
we should get too hung up on the name "DT_MIPS_LOCAL_GOTNO".  IMO it
was always misleading.

Regardless of what it was originally supposed to mean, it is actually
"the GOT index at which the local area ends and the global area starts"
or "the number of GOT entries before the global area" (the two being
equivalent of course).

>> But it was an either-or choice. :-)  Does it include entry 0 or not?
>> If yes, it's B.  If no, it's A.
>> 
>>> here is the quote from the pre-sgi System V
>>> Application Binary Interface Mips Processor Supplement:
>>>
>>> Global Offset Table (5-9, second paragraph)
>>> "The global offset tables split into two locally separate subtables:
>>> local and
>>> externals. Local entries reside in the first part of the global offset
>>> table. The
>>> value of the dynamic tag DT_MIPS_LOCAL_GOTNO holds the number of
>>> local global offset table entries."
>> 
>> To me this suggests B if taken at face value.
>
> No, the reality is that there should be a pointer to the beginning of
> the local got region and DT_MIPS_LOCAL_GOTNO represent its size.

Well, for delimiting an area we can either use "start and size" or
"start and end".  Since DT_MIPS_LOCAL_GOTNO is effectively the end
of the local area -- despite the "NO" -- we can keep backward
compatibility by seeing it as an end rather than a size.

But I agree completely about having an explicit start for the local area.
That's what the new tag I was suggesting was.  So going back to the new
GOT region, I was really thinking about the current:

   +------------------+
   | reserved entries |
   +------------------+
   |   local entries  |
   +------------------+  <-- T2
   |  global entries  |
   +------------------+

becoming (with a new name for the new region):

   +------------------+
   | reserved entries |
   +------------------+
   | general GOT data |
   +------------------+  <-- T1
   |   local entries  |
   +------------------+  <-- T2
   |  global entries  |
   +------------------+

It's entirely up to the static linker what goes in the new region.
In our case it would be R_MIPS_IRELATIVE-relocated entries, but it could
be anything really (including .lit4, .lit8, or whatever).  I.e. this region
would be handled like GOTs are on other targets.

T2 is currently called DT_MIPS_LOCAL_GOTNO, but if we had:

T1: DT_MIPS_LOCAL_GOTIDX
T2: DT_MIPS_GLOBAL_GOTIDX

with:

#define DT_MIPS_GLOBAL_GOTIDX DT_MIPS_LOCAL_GOTNO

then would it be more acceptable namewise?  We could throw in a GOTIDX
tag for the new region too for completeness.

>>> For entertainment sake here is the comment in my private elf dumper wrote back then:
>>>
>>> /**
>>>      @internal
>>>
>>>      Function:	mips_print_got
>>>
>>>      MIPS has 2 different GOT table variants that are
>>>      pretty much the same except one depends on symbol
>>>      table to got table symmetry for runtime fixup purposes
>>>      and the other uses runtime relocations.
>>>      
>>>      If there is multigot there will be entries in the first dynamic section
>>>      of type DT_MIPS_AUX_DYNAMIC which point to the other
>>>      dynamic sections which in turn point to and describe their
>>>      associated gots.
>>>      
>>>      DT_MIPS_LOCAL_GOTNO     	Starting point for DEFAULT symbols
>>>      DT_MIPS_GOTSYM  	    	Index into dsymtab matching DT_MIPS_LOCAL_GOTNO
>>>      DT_MIPS_HIPAGENO		Number of page table entries.
>>>      DT_MIPS_LOCALPAGE_GOTIDX	Starting point for a local got page table
>>>      DT_MIPS_LOCAL_GOTIDX    	Starting point for local full addresses
>>>      DT_MIPS_HIDDEN_GOTIDX   	Starting point for HIDDEN symbols
>>>      DT_MIPS_PROTECTED_GOTIDX	Starting point for PROTECTED symbols
>>>
>>>      If DT_MIPS_LOCAL_GOTIDX == DT_HIDDEN_GOT_IDX ||
>>>      	    	    	       DT_PROTECTED_GOT_IDX ||
>>> 			       DT_MIPS_LOCAL_GOTNO
>>>      then there are no local entries. Local in this sense
>>>      means addresses that may or may not have associated
>>>      entries in the symbol table or relocation table. If
>>>      they are present in the symbol table they will be marked
>>>      as STO_INTERNAL and must not be referenced outside of the
>>>      defining dso/a.out in any form.
>>>
>>>      If DT_HIDDEN_GOT_IDX == DT_PROTECTED_GOT_IDX ||
>>>      	    	    	    DT_MIPS_LOCAL_GOTNO
>>>      then there are no hidden entries. Hidden symbols
>>>      are those that are marked STO_HIDDEN in the dynamic
>>>      symbol table and are accessable from outside the defining
>>>      dso only non-symbolicly such as through pointers.
>>>
>>>
>>>      If DT_PROTECTED_GOT_IDX == DT_MIPS_LOCAL_GOTNO
>>>      then there are no protected entries. Protected symbols
>>>      are those that are marked STO_PROTECTED in the dynamic
>>>      symbol table and are accessable from the outside, but
>>>      cannot be preempted during runtime loading and thus are
>>>      "protected".
>>>      
>>>      @return  void.
>>>   */
>>>
>>> Note, for multigot this resulted in multiple dynamic sections, dynsyms and
>>> relocation fixups for the got entries.
>> 
>> Did it also result in multiple relocation tables, one for each .dynamic
>> section?  Or was there still a single .rel.dyn table?
>> 
>> If just a single .rel.dyn table, did all relocations in the table use
>> the primary GOT's DT_MIPS_GOTSYM as the local/global threshold?  If so,
>> did that mean that there was no specific limit to the number of distinct
>> global symbols that could be stored in GOT entries (thanks to multigot),
>> but that there was a limit of 16k (or 8k for n64) global symbols that
>> could be used in relocations?  (Sorry for the barrage of questions --
>> the downside of doing this by email.)
>> 
>
> I may not understand the question, but will try to answer.
> Let's pretend we had a case where the linker broke up a dso it was making
> into having 3 gp-relative regions (multigot). Each region would have its own
> .dynamic table pointing to its own unique dynsym, got, sdata, sbss, etc. By 
> basic ELF format definition, if any of these sections need relocations they
> will have their own unique relocation sections.
>
> I know, .dynrel has sort of stretched this defintion, but we keep to
> the current rule by having the dynamic table for the individual got
> describe where its relocations are and how they are distributed.
>
> The limit on symbol indexes is preserved because we are only looking at
> a sub-region.
>
> I guess the key is that each got/gp-rel region has its own individual
> .dynsym that describes its microcosm independent of the others. The
> main .dynamic section points to all the extra .dynamic sections
> through DT tags.

I was more thinking about a DSO containing something like:

	.data
	.macro	doit
	.word	foo\@
	.endm
	.rept	20000
	doit
	.endr

i.e.:

	.data
	.word   foo0
        ...
	.word   foo19999

where we have 20000 R_MIPS_REL32s against various foos and therefore need
20000 GOT entries.

Is this allowed on its own, without explicit GOT references to the foos?
If it is allowed, do you create 2 GOTs to handle it, so that each GOT is
still within the 64k limit?  If so, do the two .dynamic sections both
have their own .rel.dyn sections, each containing the R_MIPS_REL32s for
the symbols in the associated GOT?

Does the answer change if, in addition to the above, there are also
explicit GOT references to each foo, as in:

a.s:
        lw	$4,%got(foo0)($gp)
        ...
        lw	$4,%got(foo9999)($gp)

b.s:
        lw	$4,%got(foo10000)($gp)
        ...
        lw	$4,%got(foo19999)($gp)

so that the symbols fall naturally into two GOTs?  Would b.s's GOT
then have the .data relocations for foo10000 and above and a.s's GOT
have the .data relocations for the rest?

If we did have multiple .rel.dyn sections, then:

>> If there were multiple .rel.dyn tables, each tied to their own
>> .dynamic sections, how would we sort them so that all IRELATIVE
>> relocations in am object are applied after all non-IRELATIVE ones?

...this would become a concern.

>>> I am not proposing that we go down this route, but it may give a sense of
>>> the world I came from. I liked it because (other than that I designed
>>> a lot of
>>> it :-)) of the structure in symbol visibility and that I could dump
>>> the entries
>>> symbolically. Also, each GP region was described by its dynamic section.
>>>
>>> This is not a trivial change and goes beyond the ifunc scope, but it
>>> would resolve
>>> the fixup by relocation issues and usher in GP rel areas that go
>>> beyond the GOT.
>>>
>>> I really just want to get ifunc done without messing up future
>>> goodness in ld/ld.so.
>> 
>> OK, this scheme seems to create multiple .dynsyms as a way of avoiding
>> explicit relocations for the multigot entries.  Is that right?
>> I.e. rather than have a .rel.dyn entry for a multigot global GOT entry,
>> it has an entry in a secondary .dynsym instead?
>
> Right. No defence. It is a cost of doing business like this. It may be
> too expensive for some but not if they are sane and forgo building
> programs that require multigot. My guess is that the ones that need
> multigot are not afraid of this overhead. I like to guess and am wrong
> only about 80% of the time.
>
> Yes, every GP region had it's own gp relative sections and support
> sections including .dynamic, dynsym, relocations, etc. They all shared
> the same string table though.
>> 
>> Does that really pay off though?  In ELF32, symbols are 16 bytes in size
>> but REL relocations are 8 bytes in size.  And because the global GOT
>> acts as a cache, resolving normal global relocations is very cheap.
>> We only look up the symbol once, when resolving the GOT entry.
>> 
>> (If the same global symbol appeared in two GOTs and .dynsyms, did you
>> look it up twice, or just once?  If twice then the .rel.dyn approach
>> seems to win there too, as well as on size.)
>
> This (sgi multigot) does not win on the size of the collective dynamic
> sections.  There is duplication. It is a start up hit that needs to be
> evaluated before anyone wants to emulate it.
>
> Remember, one can build the dynamic ld.so affected part of the object
> very close to how we do today if everything falls into a single
> got. If it goes over the threshold one would start to get this
> overhead, but the duplication part will not be that big because the
> second got, and in reality it will be only a second got, will probably
> be a very small subset of the first got and thus few symbol and
> relocation dups as well as the duplications of gprel data sections.

I was really comparing the cost of this multigot scheme with the one
that was used for binutils.  (Note that I had no part in the binutils
multigot scheme, so I don't have an attachment either way.)  There the
idea was to treat the secondary GOTs as just another bit of data and
relocate them in the same way as you would relocate a data section.
This is of course how other targets handle their primary GOT too.

It sounds like this could also be the second multigot variant from the
comment you quoted:

    MIPS has 2 different GOT table variants that are 
    pretty much the same except one depends on symbol
    table to got table symmetry for runtime fixup purposes
    and the other uses runtime relocations.

So this might not even be an SGI vs. binutils thing, but I'll call
them that below for the sake of simplicity.

I think the differences work out as:

* The SGI scheme relies on changes to the dynamic linker.
  The binutils scheme works within the original ABI (assuming that
  the primary GOT is allowed to be bigger than 64k, as above).

  I think the binutils scheme even worked on o32 IRIX, although I might
  have made that up.

* The SGI scheme uses tags to relocate the local part of the GOT.
  The binutils scheme uses 8-byte .rel.dyn entries instead.

  So for this part of the GOT the SGI scheme wins, at least for
  large numbers of local GOT entries.  But if you create .dynsyms
  for local, internal and hidden symbols -- which binutils currently
  treats as local -- then the local part is going to be very small.
  It probably just contains page entries.

* The SGI scheme uses 16-byte .dynsym entries for each GOT entry
  that's bound to a symbol.  The binutils scheme uses 8-byte .rel.dyn
  entries instead.

  Who wins here depends on how many duplicate .dynsym entries there are.
  If the GOTs have several symbols in common (which seems likely) then
  the binutils scheme should win from both a size and speed perspective,
  since only one lookup is needed per symbol, regardless of how many
  GOTs reference it.

  Using .dynsyms for local, internal and hidden symbols adds 8 bytes
  per entry over the binutils scheme, on top of the string table cost.

* The SGI scheme allows lazy binding in secondary GOTs.  The binutils
  scheme doesn't.  This is definitely the big disadvantage of the
  binutils scheme.  (One that no-one's ever been sufficiently motivated
  to fix, unfortunately.)

* The SGI scheme requires several .dynamics and several .dynsyms,
  which is likely to confuse generic ELF code.  The binutils scheme
  avoids this.

* The SGI scheme allows you to dump the secondary GOTs in the same
  way as the primary GOT.  The binutils scheme doesn't.

Does that sound right to you?

>> I agree that in the specific case of ifuncs it would probably work
>> to do things this way, since for ifuncs the type of GOT entry needed
>> can be determined from the symbol type (IFUNC rather than FUNC).
>> But it wouldn't extend well to other types of relocation.  E.g.
>> TLS GOT entries can't be implied from the symbol type in this way.
>> It might be that the next relocation type we add also has no associated
>> symbol type.  (The type is only a 4-bit field after all, and most are
>> already taken.)
>
> I wouldn't put them in this got. I would create another one that was
> not GP relative.  It would not be part of the multigot party. We would
> have to have DT rules (maybe) for this got as well if there was
> special handling beyond explicit relocations.

Sorry, I meant new types of GOT relocation that would be needed in future.
I.e. cases where we add a new R_MIPS_FOO relocation and also want to be
to do something like:

        lw	$4, %got_foo($gp)

with %got_foo resolving to the offset of an R_MIPS_FOO-equivalent GOT entry.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]