This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Re: [PATCH, arm] Thumb shared library support: Thumb PLT, etc.
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Adam Nemet <anemet at Lnxw dot COM>
- Cc: Richard dot Earnshaw at arm dot com, binutils at sources dot redhat dot com
- Date: Fri, 19 Jul 2002 15:50:13 +0100
- Subject: Re: [PATCH, arm] Thumb shared library support: Thumb PLT, etc.
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> Richard,
>
> > First of all, do you have a copyright assignment in place for binutils
> > (and gcc, for your other patch)? Until that's sorted out we can't use
> > your code.
>
> Yes, I have assingment papers on file for GCC and binutils.
OK.
>
> > 1) I don't like the idea of having some special flag (--thumb-plt) that
> > indicates that we should build a different type of PLT. The linker must
> > be able to figure this out automatically, or we will end up with major
> > problems when it comes to interworking.
>
> First of all these were my design principals while implementing this:
>
> 1. Making Thumb shared libraries work should _not_ effect ARM shared
> libraries. People who want to make their code smaller are more
> likely to have the time to ``manually'' fine-tune their application
> than those who don't care and just want something to work on ARM.
It's true that a pure ARM-code application is most likely to be aiming for
performance and that we shouldn't do things that hinder that. However,
that's not what I was proposing.
>
> 2. Interworking issues should be kept separately. I'd like to be able
> to build pure ARM and pure Thumb shared libraries without compiling
> them with --mthumb-interwork.
The issue here is that Thumb is about improving *overall* code density.
At times best code density is achieved by using ARM code. For example, if
you know that you have a floating-point intensive function and that you
have a VFP co-processor available, best code density will come from
compiling the function to use the VFP rather than relying on softfp and
emulation.
> > We need more space for the thumb sequence than we do for an ARM one. That
> > suggests that we should probably be looking to switch to ARM code for the
> > stub. For example, we could use
> >
> > .code 16
> > .align 2
> > _plt_stub_thumb:
> > bx pc
> > nop
> > .code 32
> > _plt_stub_arm:
> > ldr ip, [pc, #8]
> > add ip, pc, ip
> > ldr ip, [ip]
> > bx ip
> > .word offset_to_target
> >
> > which means we can share the stub with both ARM and Thumb code. So while
> > this is now 6 words long we save on duplication, and we have interworking
> > from the start.
>
> I was playing with the same idea originally but I didn't like it
> because:
> Major issues:
>
> * It does not fit well with principle #1 above. 6 instead of 4
> words in the ARM case and 6 instead of 5 words for Thumb. ARM PLT
> will be 50% bigger.
It wouldn't be used for #1, only when interworking was required. With
respect to #2 the issue is about best overall code density.
>
> * It won't work. Upon calling plt[0] with lazy relocation, ip has
> to hold &GOT[n+3] so that ld.so can figure which GOT entry it
> needs to relocate. See sysdeps/arm/dl-machine.h in glibc for an
> example.
That's a much bigger issue (possibly a show-stopper). I need to think
about this one some more. The idea of having the PLT stub branch to
another stub just to achieve the mode change does not particularly appeal.
>
> Minor issues:
>
> * In addition, what is the cost of switching mode twice for a pure
> thumb shared library call every time we call it?
A switch of state requires a pipeline flush, so it's the same as a
mis-predicted branch (all branches are 'mis-predicted' on earlier ARM
processors, as are all loads to the PC). In fact, if we know we are
generating for ARMv5, then we should always output the traditional ARM PLT
entry, since it gives us interworking for free and is the shortest
sequence of them all.
>
> * What if a Thumb function in the library wants to return with mov
> pc, lr. I know that GCC does not generate such a sequence but it
> is still valid and contradicts principal #2.
This isn't an issue we need to consider here, the PLT stub is not used on
the return path, so it is the same as if the caller had called the
subroutine directly.
>
> Bottom line, I don't think we should favor Thumb or interwork over ARM
> or some complication in the linker. Pure ARM should be as fast as
> possible. In fact the reason why we need an explicit switch to
> generate a PLT that can handle Thumb is pretty much the same why
> --mthumb-interwork is not the default behavior in the GCC backend: it
> is not free.
We still don't need such a switch, after all, a PLT stub is generated in
response to a relocation directive. Since we know whether that was an ARM
or a Thumb type relocation we can generate the correct PLT entry without
the need to have a command line switch. All it requires is for the linker
to track a bit more information when pushing an internal PLT data
structure onto the list of things to generate.
R.