This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Consolidate and inline calls to __acr
On Mon, Dec 31, 2012 at 05:22:40PM +0100, Andreas Schwab wrote:
> else {i1=1; i2=k; }
> - for (i=i1,j=i2-1; i<i2; i++,j--) Z[k] += X[i]*Y[j];
> +#if 1
> + /* rearange this inner loop to allow the fmadd instructions to be
> + independent and execute in parallel on processors that have
> + dual symetrical FP pipelines. */
> + if (i1 < (i2-1))
> + {
> + /* make sure we have at least 2 iterations */
> + if (((i2 - i1) & 1L) == 1L)
> + {
> + /* Handle the odd iterations case. */
> + zk2 = x->d[i2-1]*y->d[i1];
> + }
> + else
> + zk2 = zero.d;
> + /* Do two multiply/adds per loop iteration, using independent
> + accumulators; zk and zk2. */
> + for (i=i1,j=i2-1; i<i2-1; i+=2,j-=2)
> + {
> + zk += x->d[i]*y->d[j];
> + zk2 += x->d[i+1]*y->d[j-1];
> + }
> + zk += zk2; /* final sum. */
> + }
> + else
> + {
> + /* Special case when iterations is 1. */
> + zk += x->d[i1]*y->d[i1];
> + }
> +#else
> + /* The orginal code. */
> + for (i=i1,j=i2-1; i<i2; i++,j--) zk += X[i]*Y[j];
> +#endif
>
... and this seems to be the only relevant change. If the conversion
of mp_no.d[] to int[] is approved, this should not even be necessary
since FP is not needed at all.
Siddhesh