This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCHv2] ARM: NEON optimized implementation of memcpy.

From: ext Daniel Jacobowitz <drow at false dot org>
To: Siarhei Siamashka <siarhei dot siamashka at nokia dot com>
Cc: "ext Joseph S. Myers" <joseph at codesourcery dot com>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>
Date: Wed, 15 Jul 2009 12:08:33 -0400
Subject: Re: [PATCHv2] ARM: NEON optimized implementation of memcpy.
References: <200907051821.04030.siarhei.siamashka@nokia.com> <200907142017.22566.siarhei.siamashka@nokia.com> <20090714173927.GA21499@caradoc.them.org> <200907151708.21506.siarhei.siamashka@nokia.com>

On Wed, Jul 15, 2009 at 05:08:21PM +0300, Siarhei Siamashka wrote:
> The memcpy implementation from that package is done in C, probably with the
> hope that the compiler can generate some good code for it. I highly doubt that
> this is going to happen any time soon, so normal assembly code will be always
> better.

I'd rather have numbers than generalizations; the code generated is
not too bad, and having the compiler able to schedule for each
specific processor is a lot more maintainable.

> It's good to know, just because the way they are now, performance would be
> only lost. Is there anything else that may be using these __aeabi_memcpy*
> functions at the moment?

Third-party compilers (like RealView)

> There must be some reason why these __aeabi_memcpy* functions exist in the
> first place. Probably somebody thought that handling very small copies is
> performance critical. Don't know if this is actually justified in practice.

I think this and your later comments misunderstood what I was talking
about.  __aeabi_memcpy* are supposed to be optimized for large copies;
that's in the ABI documentation.  The expectation is that small copies
will be inlined at the call site.  Thus, having it handle small copies
efficiently is not worth even a few cycles (as long as it's correct).

-- 
Daniel Jacobowitz
CodeSourcery

References:
- [PATCHv2] ARM: NEON optimized implementation of memcpy.
  - From: Siarhei Siamashka
- Re: [PATCHv2] ARM: NEON optimized implementation of memcpy.
  - From: Siarhei Siamashka
- Re: [PATCHv2] ARM: NEON optimized implementation of memcpy.
  - From: Daniel Jacobowitz
- Re: [PATCHv2] ARM: NEON optimized implementation of memcpy.
  - From: Siarhei Siamashka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]