This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [ARM] architecture specific subdirectories, optimised memchr and some questions

From: David Gilbert <david dot gilbert at linaro dot org>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: libc-ports at sourceware dot org, patches at linaro dot org
Date: Thu, 4 Aug 2011 18:11:39 +0100
Subject: Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
References: <20110715181101.GA20980@davesworkthinkpad> <Pine.LNX.4.64.1108021437200.16898@digraph.polyomino.org.uk>

Hi Joseph,
  Thanks for the reply (and I've fixed the cc I thinko'd in my original mail)

On 2 August 2011 16:06, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Fri, 15 Jul 2011, Dr. David Alan Gilbert wrote:
>
>> ? * Is the preconfigure the right place to check for the current architecture
>
> Yes.
>
>> ? * and is it right to set $submachine there?
>
> Well, setting base_machine and machine is what's meant to be done
> automatically, with submachine being determined by --with-cpu and causing
> a -mcpu= or -march= option to be passed.
>
>> ? * Why did the preconfigure previously append /arm to the end of $machine?
>
> It puts it at the beginning, not the end, and this fits in with using
> target triplets (the idea being you might configure glibc for
> armv5-linux-gnueabi, for example, and get arm/eabi/armv5). ?Though the
> list of ARM versions in config.sub (in config.git upstream) is rather old,
> one reason among others this use of target triplets isn't ideal and
> logically I think your approach of testing the compiler is better.

OK, I think both of these questions come from me not really understanding
what should be in $machine and what should be in $submachine.

The 'previously append /arm' question comes from $machine having been
'arm' by default
since the triplets I've been using are arm-linux-gnueabi.

To try and straighten those out here are some more questions:
   a) For a build without specifying the CPU should machine=arm and
submachine="" ?
   b) If the user wants to specify the CPU architecture should they be
using the triplet - i.e.
       should they be using armv7-linux-gnueabi (Or is that
armv7a-linux-gnueabi?) -
       and would that have machine=armv7a and submachine="" ?

   c) Is submachine for CPU varients - e.g. if I wanted to compile for
a cortex-a8 would
       I be compiling with --with-cpu=cortex-a8 and somehow expect
machine=armv7a
       submachine=cortex-a8 ?

   d) What happens for big endian - doesn't that have a triplet like
armeb-linux-gnueabi
      and then what is $machine?

>> ? * Ideally I don't think the architecture specifics should be in the eabi
>> ? ? subdir; they should be at the top (they aren't eabi specific) - but I
>> ? ? can't see a sensible way to rework the search order to do that
>> ? ? - suggestions?
>
> The simple approach is for each arm/eabi/$submachine directory to have an
> Implies file pointing to the directory outside eabi/.

Yeh it's probably the best way - although I was getting worried by the
proliferation
of directories with a single Implies file in.

> Otherwise, while ports has historically been used as a dumping ground for
> random code removed from libc, I don't think that's the right approach; we
> have version control to preserve old versions of code and ports should
> have those ports of glibc that aren't in libc but are reasonably close to
> a working state, not code that's been broken or obsolete for a long time.
> By now I think it would be reasonable to remove all the old-ABI ARM code
> from ports, so moving the eabi code up a directory level and eliminating
> the complexities of claiming to support two different ABIs. ?(The only
> ports that I think are in some semblance of a maintained state are alpha,
> arm/eabi, hppa, m68k, mips, powerpc - all of them for Linux only at
> present.)

Would you want that clean up as a prerequisite for this patch?

>> ? * Does the memchr boiler plate look OK? (It seems to work!) ?The code is
>> ? ? thumb-2 only which is a little unusual, but the 6T2 and 7-a that it
>> ? ? supports can both do that.
>
> Has it been tested (with the glibc testsuite) for both big and little
> endian?

Certainly not for big endian; I did a self contained test for that -
using a binutils+eglibc+gcc set - I do not have
a full armeb chroot to test that against; what's the best way of doing that?
In little endian it did run the testsuite.

>
> The code certainly needs CFI directives for when it adjusts the stack
> frame and saves/restores call-preserved registers, so that the debugger
> can backtrace properly when stopping things anywhere in this code. ?See
> sysdeps/arm/memcpy.S for example.

OK, will do.

>> ? * Given this directory structure - where would I put some code that
>> ? ? was Neon specific? It's a feature that's available in 7-a varients
>> ? ? (and later?) ? arch/arm/eabi/armv7-a/neon?
>
> That's a plausible location for such code selected at configure time - but
> also consider the use of STT_GNU_IFUNC when multi-arch is enabled.

Yes IFUNC looks like it's going to get important, especially for some routines
that need optimising differently for different implementations.

> Things get more complicated when you consider features not even available
> for all processors with NEON - fused multiply-add in VFPv4, for example.
> (The GCC side of that - built-in function support for fma - hasn't been
> done either, although other targets have it in 4.6.) ?Here are my notes on
> how ideally the fma functions should be implemented for ARM (largely
> independent of your changes except that they may provide the framework for
> e.g. VFPv4-specific code in glibc):

Yes the interaction of different features/CPUs gets really complex.
I've not had a chance to play with any VFPv4 stuff yet.

> 1. Suppose glibc is being built for a VFPv4 multilib. ?(There is no
> predefined preprocessor macro to say that VFPv4 is in use. ?There should
> be one.) ?Then it should use the VFPv4 fused instructions, whether through
> .S files, inline assembly or GCC built-in functions (once added). ?The
> existing libc code is correct but not optimal.
>
> 2. Suppose glibc is being built for a VFP multilib, not v4. ?The existing
> libc code is correct, but if at runtime the processor is v4 then it can do
> better via IFUNC. ?This is essentially what x86 and x86_64 do in IFUNC
> configurations.
>
> 3. Suppose glibc is being built for a non-VFP multilib. ?Then it is
> optimal to use the first of (VFPv4 implementation, plain VFP, soft-fp[*])
> will work on the processor used at runtime. ?(Even if the __aeabi_* helper
> functions are made to use IFUNC in future so they use VFP operations on
> VFP processors - which may well be desirable, though it has some tricky
> aspects - and even if they also get optional support for exceptions and
> rounding modes in the soft-float case, a straight soft-fp implementation
> of FMA is still going to be faster than the generic one layered on other
> soft-fp operations.) ?So in the absence of IFUNC, or in the presence of
> IFUNC but when glibc is being built with options incompatible with
> enabling VFP (such as iWMMXt), a soft-fp version should be used. ?In the
> presence of IFUNC, it could be used to select between a VFPv4 version, the
> generic version built with -mfpu=vfp -mfloat-abi=softfp, and the soft-fp
> version - though simply using the soft-fp version always in the
> non-VFP-multilib case may also be reasonable.

Are these 3 on the basis that some optimised routine somewhere in libc
will be explicitly wanting a fused multiply add or that this should happen
for any libc routine that happens to be doing multiply/add ?  I'm not sure how
you would do the later with ifunc.

> [*] This soft-fp version of fma doesn't actually exist. ?Steve Munroe had
> a version in PR 3268, but it's not the right way of implementing fma using
> soft-fp. ?Doing it properly means splitting up the multiplication macros
> to expose widening multiply, and implementing _FP_FMA. ?The result could
> then be used in GCC as well (to replace fmsub in
> config/rs6000/darwin-ldouble.c which is currently used in implementing IBM
> long double for soft-float Power GNU/Linux). ?I expect this could end up
> being a few days' work to get it working properly everywhere (including
> testing with Jakub's random test generator at
> <http://sourceware.org/ml/libc-hacker/2010-10/msg00005.html>). ?It's
> relevant for getting fma working properly on any target without runtime
> support for exceptions and rounding modes, including but not limited to
> older ARM processors.

Also relevant for emulating the FMA in QEMU.

Dave

> --
> Joseph S. Myers
> joseph@codesourcery.com
>

Follow-Ups:
- Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
  - From: Joseph S. Myers

References:
- Re: [ARM] architecture specific subdirectories, optimised memchr and some questions
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]