This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] BZ #14649: Add multiarch FMA support to x86-64 libm


On 10/02/2012 05:15 PM, H.J. Lu wrote:
On Tue, Oct 2, 2012 at 7:50 AM, Andreas Jaeger <aj@suse.com> wrote:
On 10/02/2012 04:34 PM, H.J. Lu wrote:

On Tue, Oct 2, 2012 at 12:03 AM, Andreas Jaeger <aj@suse.com> wrote:

On 10/01/2012 08:49 PM, H.J. Lu wrote:


On Mon, Oct 1, 2012 at 10:56 AM, Andreas Jaeger <aj@suse.com> wrote:


On 10/01/2012 05:14 PM, H.J. Lu wrote:



Hi,


This patch adds multiarch FMA support to x86-64 libm.  Tested on
FMA machine.  OK for master?




What kind of performance benefits does it bring us? Are you sure that
all



I don't have any performance numbers. My patch just enables FMA optimization, similar to FMA4 optimization.



Could you test at least one of these functions to see whether it makes a difference at all, please?


It works correctly on FMA machine.  I will send a separate patch
to update x86-64 ULPs due to FMA instructions.  FMA functions
are a little bit smaller than SSE/AVX version.


What about performance? For such a change I don't think it's unreasonable to
ask for some numbers...

Liubox, Kirll, can you get hjl/fma/master branch vs master branch performance numbers on Haswell for those libm functions optimized for FMA?


the functions you enhance are really using fma and thus benefit from
the
change?



Not all FMA/FMA4 functions have FMA/FMA4 instructions. We should take a look and use AVX functions instead.



So, let's only add those functions that really benefit from this.



Since functions in libm are implemented by calling each other, all functions called from a libm function compiled for FMA must also be compiled by FMA with _fma as the suffix in their symbol names. Otherwise, wrong functions may be called. One way


Really?

If func a calls b, then a can be fma optimized but b does not need to be.
Why does a_fma need to call b_fma instead of b?


Take e_pow for example, when we optimize it for FMA, we must also optimize __slowpow for FMA since it calls __slowpow. Although __slowpow itself doesn't use any FMA instructions, it calls other functions which use FMA:

[hjl@gnu-tools-1 math]$ nm slowpow-fma4.o
                  U __add_fma4
                  U __dbl_mp_fma4
0000000000000000 r eps.3048
                  U __halfulp_fma4
0000000000000000 r .LC0
                  U __mp_dbl
                  U __mpexp_fma4
                  U __mplog_fma4
                  U __mul_fma4
0000000000000000 T __slowpow_fma4
                  U __sub_fma4
[hjl@gnu-tools-1 math]$

So even if __slowpow doesn't use FMA, we must compile __slowpow
with FMA so that it can calls other functions with FMA.   One way to
fix it is to make all those internal functions IFUC.  Their references will
be resolved to the proper versions at run-time. Instead of calling
__slowpow_fma4, we just call __slowpow, which is an IFUNC function
optimized for SSE2 and AVX.  Other internal functions can be
optimized for SSE2, AVX, FMA and FMA4.

I see the advantage of doing so if it brings us speed benefits - but not the necessity. In other words: This is for me an optimization issue not one of correctness.


slowpow could call a non-fma (generic) __mpexp function instead of an optimized one.

Andreas
--
 Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
  SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
   GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
    GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]