This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC Power PC G3 optimized sqrtf function.


Steve Munroe writes:


libc-alpha-owner@sourceware.org wrote on 12/14/2006 01:05:57 PM:


Hi everybody,

This is my 1st post and attempt at contributing to glibc

Thanks Conn. To get started, submittions to libc are normally in the form
of a patch with a changelog header. Please review
http://www.gnu.org/prep/standards/standards.html section 6.8.


I have written a sqrtf function that is much faster on a PowerPC G3
than
the original one used. It uses the frsqrte instruction and Newton

<SNIP>


will not work on a 601 processor.

Next, I assume you intend to add this to the powerpc-cpu add-on using
--with-cpu=g3 configuration?



Correct


In this case we need to place your e_sqrtf.S file an appropriate directory
so that it does not impact PowerPCs that do have fsqrt. For example:


./powerpc-cpu/sysdeps/powerpc/powerpc32/g3/fpu/e_sqrtf.S


Okay, I'll do that.


You will also need an Implies file in the sysdeps/unix/sysv/linux tree to
make sure your new directory is early enough in the search order to
override the e_sqrtf in libc trunc.


For example:

./powerpc-cpu/sysdeps/unix/sysv/linux/powerpc/powerpc32/g3/fpu/Implies

would contain:

powerpc/powerpc32/g3/fpu

If you want g4 to default to the g3 implementation, create
powerpc/powerpc32/g4/fpu directories with Implies files referencing the
powerpc/powerpc32/g3/fpu directories. Similarly for 603, 604, ...


See the powerpc-cpu README for more details.

You patch should reflect this directory detail.

The limiting factor on ieee conformance is the frsqrte instruction must

produce a result that is within 1/59th of the correct value. A timing
test
on all valid values using the current glibc function takes about 26
minutes
on a iMac g3 400MHz machine. With my implementation it takes about 21
minutes.



Not sure what you are getting at here. The PowerPC Arch 2.0x (V1.x also) states that frsqrte is "correct to one part in 32". Does you algorithm require better precision then the Arch provides?

Yes


The Arch does say that
results may vary between implementations. So does G3/G4 frsqrte provide
better then 1/32 precision?

Yes they do. All implementations



Please read the header for more details and give me some feedback.

P.S. Do I need to file copyright assignment papers for this?


Yes you do.

Is there a link on how to do this?


Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center



Thank you,


Conn

---------------------------------------
Conn Clark


Electronic Systems Technology
415 N. Quay Street Building B1 (509)-735-9092 ext 117
Kennewick, WA. 99336


Gentoo Linux RU13$!!!


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]