This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
libc-alpha-owner@sourceware.org wrote on 12/14/2006 01:05:57 PM:
Hi everybody,Thanks Conn. To get started, submittions to libc are normally in the form
This is my 1st post and attempt at contributing to glibc
of a patch with a changelog header. Please review
http://www.gnu.org/prep/standards/standards.html section 6.8.
I have written a sqrtf function that is much faster on a PowerPC G3thanthe original one used. It uses the frsqrte instruction and Newton
will not work on a 601 processor.Next, I assume you intend to add this to the powerpc-cpu add-on using
--with-cpu=g3 configuration?
In this case we need to place your e_sqrtf.S file an appropriate directory
so that it does not impact PowerPCs that do have fsqrt. For example:
./powerpc-cpu/sysdeps/powerpc/powerpc32/g3/fpu/e_sqrtf.S
You will also need an Implies file in the sysdeps/unix/sysv/linux tree to
make sure your new directory is early enough in the search order to
override the e_sqrtf in libc trunc.
For example:
./powerpc-cpu/sysdeps/unix/sysv/linux/powerpc/powerpc32/g3/fpu/Implies
would contain:
powerpc/powerpc32/g3/fpu
If you want g4 to default to the g3 implementation, create
powerpc/powerpc32/g4/fpu directories with Implies files referencing the
powerpc/powerpc32/g3/fpu directories. Similarly for 603, 604, ...
See the powerpc-cpu README for more details.
You patch should reflect this directory detail.
The limiting factor on ieee conformance is the frsqrte instruction must
produce a result that is within 1/59th of the correct value. A timingteston all valid values using the current glibc function takes about 26minuteson a iMac g3 400MHz machine. With my implementation it takes about 21
minutes.
Not sure what you are getting at here. The PowerPC Arch 2.0x (V1.x also) states that frsqrte is "correct to one part in 32". Does you algorithm require better precision then the Arch provides?
The Arch does say that results may vary between implementations. So does G3/G4 frsqrte provide better then 1/32 precision?
Please read the header for more details and give me some feedback.
P.S. Do I need to file copyright assignment papers for this?
Yes you do.
Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |