This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, arm] Thumb shared library support: Thumb PLT, etc.


> Richard,
> 
> > First of all, do you have a copyright assignment in place for binutils 
> > (and gcc, for your other patch)?  Until that's sorted out we can't use 
> > your code.
> 
> Yes, I have assingment papers on file for GCC and binutils.

OK.

> 
> > 1) I don't like the idea of having some special flag (--thumb-plt) that 
> > indicates that we should build a different type of PLT.  The linker must 
> > be able to figure this out automatically, or we will end up with major 
> > problems when it comes to interworking.
> 
> First of all these were my design principals while implementing this:
> 
> 1. Making Thumb shared libraries work should _not_ effect ARM shared
>    libraries.  People who want to make their code smaller are more
>    likely to have the time to ``manually'' fine-tune their application
>    than those who don't care and just want something to work on ARM.

It's true that a pure ARM-code application is most likely to be aiming for 
performance and that we shouldn't do things that hinder that.  However, 
that's not what I was proposing.

> 
> 2. Interworking issues should be kept separately.  I'd like to be able
>    to build pure ARM and pure Thumb shared libraries without compiling
>    them with --mthumb-interwork.

The issue here is that Thumb is about improving *overall* code density.  
At times best code density is achieved by using ARM code.  For example, if 
you know that you have a floating-point intensive function and that you 
have a VFP co-processor available, best code density will come from 
compiling the function to use the VFP rather than relying on softfp and 
emulation.

> > We need more space for the thumb sequence than we do for an ARM one.  That 
> > suggests that we should probably be looking to switch to ARM code for the 
> > stub.  For example, we could use
> > 
> > 	.code 16
> > 	.align 2
> > _plt_stub_thumb:
> > 	bx	pc
> > 	nop
> > 	.code 32
> > _plt_stub_arm:
> > 	ldr	ip, [pc, #8]
> > 	add	ip, pc, ip
> > 	ldr	ip, [ip]
> > 	bx	ip
> > 	.word	offset_to_target
> > 
> > which means we can share the stub with both ARM and Thumb code.  So while 
> > this is now 6 words long we save on duplication, and we have interworking 
> > from the start.
> 
> I was playing with the same idea originally but I didn't like it
> because:
>   Major issues:
> 
>   * It does not fit well with principle #1 above.  6 instead of 4
>     words in the ARM case and 6 instead of 5 words for Thumb.  ARM PLT
>     will be 50% bigger.

It wouldn't be used for #1, only when interworking was required.  With 
respect to #2 the issue is about best overall code density.

> 
>   * It won't work.  Upon calling plt[0] with lazy relocation, ip has
>     to hold &GOT[n+3] so that ld.so can figure which GOT entry it
>     needs to relocate.  See sysdeps/arm/dl-machine.h in glibc for an
>     example.

That's a much bigger issue (possibly a show-stopper).  I need to think 
about this one some more.  The idea of having the PLT stub branch to 
another stub just to achieve the mode change does not particularly appeal.


> 
>   Minor issues:
> 
>   * In addition, what is the cost of switching mode twice for a pure
>     thumb shared library call every time we call it?

A switch of state requires a pipeline flush, so it's the same as a 
mis-predicted branch (all branches are 'mis-predicted' on earlier ARM 
processors, as are all loads to the PC).  In fact, if we know we are 
generating for ARMv5, then we should always output the traditional ARM PLT 
entry, since it gives us interworking for free and is the shortest 
sequence of them all.

> 
>   * What if a Thumb function in the library wants to return with mov
>     pc, lr.  I know that GCC does not generate such a sequence but it
>     is still valid and contradicts principal #2.

This isn't an issue we need to consider here, the PLT stub is not used on 
the return path, so it is the same as if the caller had called the 
subroutine directly.

> 
> Bottom line, I don't think we should favor Thumb or interwork over ARM
> or some complication in the linker.  Pure ARM should be as fast as
> possible.  In fact the reason why we need an explicit switch to
> generate a PLT that can handle Thumb is pretty much the same why
> --mthumb-interwork is not the default behavior in the GCC backend: it
> is not free.

We still don't need such a switch, after all, a PLT stub is generated in 
response to a relocation directive.  Since we know whether that was an ARM 
or a Thumb type relocation we can generate the correct PLT entry without 
the need to have a command line switch.  All it requires is for the linker 
to track a bit more information when pushing an internal PLT data 
structure onto the list of things to generate.

R.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]