This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [Mips}Using DT tags for handling local ifuncs


On 12/19/2013 04:35 PM, Richard Sandiford wrote:> Jack Carter <Jack.Carter@imgtec.com> writes:
>>>> I also have a hard time with how the GOT is used for binutils. In my
>>>> experience and world view, sections have attributes that make them gp
>>>> relative or not. All these sections get gathered in gp relative
>>>> regions that are 64k from a value that will be in their $GP. If there
>>>> are GOT elements that are not gp relative, they should be in another
>>>> .got that is not marked SHF_MIPS_GPREL. It will not get laid out and
>>>> calibrated with any of the other GOTs.  Other sections in my life that
>>>> get bundled up in the equation for multigot are .sbss, .sdata,
>>>> .lit[4,8,16], .srdata, but only if they are marked SHF_MIPS_GPREL.
>>>
>>> Just so I understand, do you think that the ABI GOT should always be 64k
>>> or smaller?  I.e. DT_MIPS_LOCAL_GOTNO + (DT_MIPS_SYMTABNO - DT_MIPS_GOTSYM)
>>> should be <= 64 * 1024 / sizeof (void *)?  If so, what should happen
>>> (under the original or IRIX n32/n64 ABIs) if the number of symbols
>>> involved in .rel.dyn relocations exceeds the 64k limit?  Is that a
>>> link error?
>>>
>> Yes, because in sgi's case you count all the SHF_MIPS_GPREL sections as
>> the GP area. .got is only one of them and sgi just put gp-relative
>> entries in it.
> 
> But why then do you think the R_MIPS_GOTHI16/R_MIPS_GOTLO16 relocs
> and R_MIPS_CALLHI16/R_MIPS_CALLLO16 relocs were defined?  (They were
> part of the original ABI.)  If the intention really was to limit the
> ABI GOT to 64k I don't think these "xgot" relocs would be needed.

I believe, remember this is religious, that it was the first attempt to solve the large 
GP region problem. If we had our multi-got working I don't think xgot would have seen
the light of day. Multi-got was invisible to the general user and had no runtime
down sides beyond more support sections and startup explicit relocations.

> 
>>>> The DT_MIPS_LOCAL_GOTNO describes local got entries. Not other
>>>> partitions that we reserve the right to put non-local got entries.
>>>
>>> I'm still not sure which part you're describing as the local GOT here.
>>> Let's go back to the original 32-bit GOT layout, without any GNU extensions:
>>>
>>>        +------------+   +    <--- DT_PLTGOT
>>>        |   entry 0  |   |
>>>        +------------+ + B
>>>        |  ........  | A |
>>>        +------------+ + +    <--- DT_PLTGOT + DT_MIPS_LOCAL_GOTNO * 4
>>>        | Global GOT |
>>>        +------------+
>>>
>>> where:
>>>
>>>    The zero entry in the global offset table is reserved to hold the
>>>    address of the entry point in the dynamic linker to call when lazy
>>>    resolving text symbols. The dynamic linker must always initialize this
>>>    entry regardless of whether lazy binding is or is not enabled.
>>>
>>> Do you see the local GOT as being A or B?  I.e. does it include
>>> the zero entry?
>>
>> It is by definition A and B,

Entry[0] is a cheat, mistake, act of carelessness in my humble opinion. Not
what it does, but the fact that it was allowed to be part of the local got region.
It should have been explicitly pointed to in the DT table and the local region
start pointed to.

It is an oversight and an exception that the lawyers can use to further encroach
on the local got region.

In my view of the object format world, dynamic areas need to be explicitly called
out. This is a dangerous region to be working on heuristics and exceptions. Currently
ld.so assumes the local region is DT_MIPS_LOCAL_GOTNO long and starts at PLTGOT.
But wait, we have a special entry so we up the loop counter and maybe we will discover
that there is another exception for slot #2 and we up the counter again. Then and only
then we run the loop to fix up locals. 

It should not have been done this way from the beginning, but I for one can look back
on a lot of my decisions and say the same thing. It is easier in retrospective to be wise.

> 
> But it was an either-or choice. :-)  Does it include entry 0 or not?
> If yes, it's B.  If no, it's A.
> 
>> here is the quote from the pre-sgi System V
>> Application Binary Interface Mips Processor Supplement:
>>
>> Global Offset Table (5-9, second paragraph)
>> "The global offset tables split into two locally separate subtables: local and
>> externals. Local entries reside in the first part of the global offset
>> table. The
>> value of the dynamic tag DT_MIPS_LOCAL_GOTNO holds the number of
>> local global offset table entries."
> 
> To me this suggests B if taken at face value.

No, the reality is that there should be a pointer to the beginning of the local
got region and DT_MIPS_LOCAL_GOTNO represent its size.

> 
>> The sgi edition is essentially the same but it includes:
>> "It (the GOT" is essentially two tables. The first (with DT_MIPS_LOCAL_GOTNO
>> entries) consists of local GOT addresses, i.e. non-preemptible (protected)
>> addresses defined within the executable/DSO."
> 
> And to me this suggests A if taken at face value, since entry 0 isn't a
> "non-preemptible (protected) address defined within the executable/DSO".
> It's an address in the dynamic linker instead.
> 
> Which of A and B seems right to you?

See above.

> 
>>>> Dealing with the ifunc "local" entries implicitly will save a
>>>> relocation lookup, a tiny blip of time in relation to the other costs
>>>> of calling the resolver. So I am arguing about how many angels can
>>>> dance on a pin.
>>>
>>> Yeah, maybe this is one we'll have to agree to disagree on.  I think the
>>> benefit of having an implicitly-relocated irelative region is small at best.
>>> I like the generality of including the GOT R_MIPS_IRELATIVE GOT
>>> relocations in the general .rel.dyn pool and sorting them accordingly,
>>> because it feels more future-proof.  I also think an implicit region is
>>> harder to handle in a backward-compatible way, since if we just add new
>>> tags, older ld.sos would ignore them and not flag an error.
>>
>> Then go the way sgi did and have .dynsym indexed regions for:
>>
>>      DT_MIPS_LOCAL_GOTIDX
>>      DT_MIPS_INTERNAL_GOTIDX
>>      DT_MIPS_HIDDEN_GOTIDX
>>      DT_MIPS_PROTECTED_GOTIDX
> 
> Older linkers would ignore those too though.

That's right. We had the luxury of breaking from o32 to n32/64. This would be rather
disruptive unless we had an abi change again.

> 
>> For entertainment sake here is the comment in my private elf dumper wrote back then:
>>
>> /**
>>      @internal
>>
>>      Function:	mips_print_got
>>
>>      MIPS has 2 different GOT table variants that are
>>      pretty much the same except one depends on symbol
>>      table to got table symmetry for runtime fixup purposes
>>      and the other uses runtime relocations.
>>      
>>      If there is multigot there will be entries in the first dynamic section
>>      of type DT_MIPS_AUX_DYNAMIC which point to the other
>>      dynamic sections which in turn point to and describe their
>>      associated gots.
>>      
>>      DT_MIPS_LOCAL_GOTNO     	Starting point for DEFAULT symbols
>>      DT_MIPS_GOTSYM  	    	Index into dsymtab matching DT_MIPS_LOCAL_GOTNO
>>      DT_MIPS_HIPAGENO		Number of page table entries.
>>      DT_MIPS_LOCALPAGE_GOTIDX	Starting point for a local got page table
>>      DT_MIPS_LOCAL_GOTIDX    	Starting point for local full addresses
>>      DT_MIPS_HIDDEN_GOTIDX   	Starting point for HIDDEN symbols
>>      DT_MIPS_PROTECTED_GOTIDX	Starting point for PROTECTED symbols
>>
>>      If DT_MIPS_LOCAL_GOTIDX == DT_HIDDEN_GOT_IDX ||
>>      	    	    	       DT_PROTECTED_GOT_IDX ||
>> 			       DT_MIPS_LOCAL_GOTNO
>>      then there are no local entries. Local in this sense
>>      means addresses that may or may not have associated
>>      entries in the symbol table or relocation table. If
>>      they are present in the symbol table they will be marked
>>      as STO_INTERNAL and must not be referenced outside of the
>>      defining dso/a.out in any form.
>>
>>      If DT_HIDDEN_GOT_IDX == DT_PROTECTED_GOT_IDX ||
>>      	    	    	    DT_MIPS_LOCAL_GOTNO
>>      then there are no hidden entries. Hidden symbols
>>      are those that are marked STO_HIDDEN in the dynamic
>>      symbol table and are accessable from outside the defining
>>      dso only non-symbolicly such as through pointers.
>>
>>
>>      If DT_PROTECTED_GOT_IDX == DT_MIPS_LOCAL_GOTNO
>>      then there are no protected entries. Protected symbols
>>      are those that are marked STO_PROTECTED in the dynamic
>>      symbol table and are accessable from the outside, but
>>      cannot be preempted during runtime loading and thus are
>>      "protected".
>>      
>>      @return  void.
>>   */
>>
>> Note, for multigot this resulted in multiple dynamic sections, dynsyms and
>> relocation fixups for the got entries.
> 
> Did it also result in multiple relocation tables, one for each .dynamic
> section?  Or was there still a single .rel.dyn table?
> 
> If just a single .rel.dyn table, did all relocations in the table use
> the primary GOT's DT_MIPS_GOTSYM as the local/global threshold?  If so,
> did that mean that there was no specific limit to the number of distinct
> global symbols that could be stored in GOT entries (thanks to multigot),
> but that there was a limit of 16k (or 8k for n64) global symbols that
> could be used in relocations?  (Sorry for the barrage of questions --
> the downside of doing this by email.)
> 

I may not understand the question, but will try to answer.
Let's pretend we had a case where the linker broke up a dso it was making
into having 3 gp-relative regions (multigot). Each region would have its own
.dynamic table pointing to its own unique dynsym, got, sdata, sbss, etc. By 
basic ELF format definition, if any of these sections need relocations they
will have their own unique relocation sections.

I know, .dynrel has sort of stretched this defintion, but we keep to the current rule by
having the dynamic table for the individual got describe where its relocations are
and how they are distributed. 

The limit on symbol indexes is preserved because we are only looking at
a sub-region.

I guess the key is that each got/gp-rel region has its own individual .dynsym that 
describes its microcosm independent of the others. The main .dynamic section
points to all the extra .dynamic sections through DT tags.

> If there were multiple .rel.dyn tables, each tied to their own
> .dynamic sections, how would we sort them so that all IRELATIVE
> relocations in am object are applied after all non-IRELATIVE ones?
> 
>> I am not proposing that we go down this route, but it may give a sense of
>> the world I came from. I liked it because (other than that I designed a lot of
>> it :-)) of the structure in symbol visibility and that I could dump the entries
>> symbolically. Also, each GP region was described by its dynamic section.
>>
>> This is not a trivial change and goes beyond the ifunc scope, but it would resolve
>> the fixup by relocation issues and usher in GP rel areas that go beyond the GOT.
>>
>> I really just want to get ifunc done without messing up future goodness in ld/ld.so.
> 
> OK, this scheme seems to create multiple .dynsyms as a way of avoiding
> explicit relocations for the multigot entries.  Is that right?
> I.e. rather than have a .rel.dyn entry for a multigot global GOT entry,
> it has an entry in a secondary .dynsym instead?

Right. No defence. It is a cost of doing business like this. It may be too expensive for some
but not if they are sane and forgo building programs that require multigot. My guess is that
the ones that need multigot are not afraid of this overhead. I like to guess and am wrong only
about 80% of the time.

Yes, every GP region had it's own gp relative sections and support sections including 
.dynamic, dynsym, relocations, etc. They all shared the same string table though.
> 
> Does that really pay off though?  In ELF32, symbols are 16 bytes in size
> but REL relocations are 8 bytes in size.  And because the global GOT
> acts as a cache, resolving normal global relocations is very cheap.
> We only look up the symbol once, when resolving the GOT entry.
> 
> (If the same global symbol appeared in two GOTs and .dynsyms, did you
> look it up twice, or just once?  If twice then the .rel.dyn approach
> seems to win there too, as well as on size.)

This (sgi multigot) does not win on the size of the collective dynamic sections.
There is duplication. It is a start up hit that needs to be evaluated before
anyone wants to emulate it.

Remember, one can build the dynamic ld.so affected part of the object very close
to how we do today if everything falls into a single got. If it goes over the threshold
one would start to get this overhead, but the duplication part will not be that big
because the second got, and in reality it will be only a second got, will probably
be a very small subset of the first got and thus few symbol and relocation dups
as well as the duplications of gprel data sections.

> 
> I agree that in the specific case of ifuncs it would probably work
> to do things this way, since for ifuncs the type of GOT entry needed
> can be determined from the symbol type (IFUNC rather than FUNC).
> But it wouldn't extend well to other types of relocation.  E.g.
> TLS GOT entries can't be implied from the symbol type in this way.
> It might be that the next relocation type we add also has no associated
> symbol type.  (The type is only a 4-bit field after all, and most are
> already taken.)

I wouldn't put them in this got. I would create another one that was not GP relative.
It would not be part of the multigot party. We would have to have DT rules (maybe) for
this got as well if there was special handling beyond explicit relocations.

> 
> It would also mean creating .dynsym entries for all ifuncs that were
> STB_LOCAL in the original .o (as well as dynsyms for internal and
> hidden global symbols).  Should those STB_LOCAL-derived dynsyms have
> names or be nameless?  If there are multiple .os with the same STB_LOCAL
> symbol name, should we try to make them unique when converting them to
> dynsyms, or keep several dynsyms with the same name?

The implementation we did was to generate unique dynsyms for them. There
may well have been a more clever way of doing this.

> 
> As for the comment about dumping entries symbolically: like I mentioned
> before, we still have local, internal and hidden symbols in .symtab.
> But the nice thing about .symtab is that it can be stripped to save space.
> If we force the names of local, internal and hidden symbols into
> .dynsym then it's harder to get rid of them later.

My kingdom for one of my contrived (gp size reduced) multigot test cases. I could 
show you the dump and it is clear what belonged to what gp region. It was really
nice.

> 
> Thanks,
> Richard
> 

It was a bear to redo the sgi/mips linker to handle multigot. Partially because it was initially
implemented incorrectly and I was constantly afraid of throwing out the baby with the
bathwater. Once it was done though, a wonderful calm occurred with out big iron customers.

Jack



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]