This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] BZ #14649: Add multiarch FMA support to x86-64 libm


On Tue, Oct 2, 2012 at 8:25 AM, Andreas Jaeger <aj@suse.com> wrote:
>>>>
>>>> Since functions in libm are implemented by calling each other,
>>>> all functions called from a libm function compiled for FMA must
>>>> also be compiled by FMA with _fma as the suffix in their symbol
>>>> names.  Otherwise, wrong functions may be called.   One way
>>>
>>>
>>>
>>> Really?
>>>
>>> If func a calls b, then a can be fma optimized but b does not need to be.
>>> Why does a_fma need to call b_fma instead of b?
>>>
>>
>> Take e_pow for example, when we optimize it for FMA, we must also optimize
>> __slowpow for FMA since it calls __slowpow.  Although __slowpow itself
>> doesn't use any FMA instructions, it calls other functions which use FMA:
>>
>> [hjl@gnu-tools-1 math]$ nm slowpow-fma4.o
>>                   U __add_fma4
>>                   U __dbl_mp_fma4
>> 0000000000000000 r eps.3048
>>                   U __halfulp_fma4
>> 0000000000000000 r .LC0
>>                   U __mp_dbl
>>                   U __mpexp_fma4
>>                   U __mplog_fma4
>>                   U __mul_fma4
>> 0000000000000000 T __slowpow_fma4
>>                   U __sub_fma4
>> [hjl@gnu-tools-1 math]$
>>
>> So even if __slowpow doesn't use FMA, we must compile __slowpow
>> with FMA so that it can calls other functions with FMA.   One way to
>> fix it is to make all those internal functions IFUC.  Their references
>> will
>> be resolved to the proper versions at run-time. Instead of calling
>> __slowpow_fma4, we just call __slowpow, which is an IFUNC function
>> optimized for SSE2 and AVX.  Other internal functions can be
>> optimized for SSE2, AVX, FMA and FMA4.
>
>
> I see the advantage of doing so if it brings us speed benefits - but not the
> necessity. In other words: This is for me an optimization issue not one of
> correctness.

IFUNC is designed for speed, not for correctness.

> slowpow could call a non-fma (generic) __mpexp function instead of an
> optimized one.
>

There should be no __slowpow_fma4, just __slowpow.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]