This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy using NEON/VFP.


On 12/04/13 12:52, Schwarz, Konrad wrote:
-----Original Message-----
From: Richard Earnshaw [mailto:rearnsha@arm.com]
Sent: Friday, April 12, 2013 11:41 AM
Subject: Re: [PATCH, v3] ARM: Integrate Cortex-A15 optimized memcpy
using NEON/VFP.

I would have thought these days, with hardware floating-point support
required by the Linux HF ABI, that this wasn't likely to be a major
issue.  The compiler will use FP insns freely as well whenever they are
available, even for data moves.  If you're using memcpy enough for
performance to be an issue, then you'd want to use the fastest sequence
possible.  If you're not, then why would you care?

A thread's context switch time is increased if it uses
floating point registers.  Once a thread has obtained a
floating point context, there is no way of getting rid of
it again.


Lazy context switching would essentially do that.

Note that Section B1.8.4 of DDI0406B, an edition of the
ARM V7 architecture manual, describes exactly the optimization
I mentioned in my original post.


I'm well aware of it. Indeed, I've written such context switching code myself in the past.

So at least up to V7, the architects behind the ARM ISA cared
about this.

Also, I've never seen a compiler use FP instructions freely, as I expect
compiler writers are aware of this issue.


Well I've certainly seen GCC do that rather than spill registers.

I see build options in the code for three variants: With Neon (and
VFP), with VFP only and without either.  That means that a bare metal
systems have the option of using an integer-only variant (as does
anyone else if they are really worried about using FP registers within
memcpy).

How would an application select which variant to use?


Everything you're talking about here is for full OS-based systems with context switching. Since Newlib is primarily (outside of Cygwin) a library for bare metal systems, surely we should provide developers with ability to chose. Given that the code provides three variants for different configurations, I don't really see what you are arguing about, unless it is that we shouldn't give users a choice.

On Linux the code can be bound at run time using the Ifunc feature. Normally that would be done as a platform choice based on the hardware features available, but I see no reason why it couldn't have some input from the APP developer if that was really felt to be necessary.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]