This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Patch] Faster __mpexp polynomial evaluation

From: Siddhesh Poyarekar <siddhesh at redhat dot com>
To: libc-alpha at sourceware dot org
Date: Mon, 11 Feb 2013 19:46:19 +0530
Subject: [Patch] Faster __mpexp polynomial evaluation

Hi,

The mpexp implementation does the following to evaluate exp:

* Scale down the input to close to or less than 1
* Evaluate a fixed number of iterations of the Taylor series expansion
  for exp
* Scale the output back up

In this, the number of iterations of Taylor series is based on the
precision required and is evaluated as:

e^x = 1 + x (1 + x / 2 (1 + x / 3 (1 + x/4 ...)))

This is slow because it involves:

1. multiplication of the intermediate result with X
2. division of the result with K (the integer values above)

The division involves another multiplication, so we're effectively
doing two multiplications in multiple precision.

Attached patch is an attempt to reduce the number of multiplications.
The mathematical basis for the approach is in a comment in the patch,
but I'll mention it briefly here.  The idea is to do away with the
division within loop and divide the result with n! (that's factorial).
For each term I then simply multiply the resultant n!/k! with x and
I'm done.  That makes it just the one multiplication in loop.

I have verified that there are no regressions from this on x86_64 and
powerpc64 in the testsuite.

Performance:
-----------

I used the same pow program[1] that triggers slowpow with the highest
precision (768 bits) since slowpow uses exp extensively.  The input
used is also the same:

./powtest 10000 1.0000000000000020 1.5000000000000000

x86_64 without patch:

Total:4211665134, Fastest:406628, Slowest:1329234, Avg:421166.513400

x86_64 with patch:

Total:2159516328, Fastest:206774, Slowest:753000, Avg:215951.632800

That makes it a 49.15% improvement in the best case and 48.73%
improvement in the average case.

Powerpc64 without patch:

Total:7998537390, Fastest:797754, Slowest:928981, Avg:799853.739000

Powerpc64 with patch:

Total:3999533211, Fastest:398721, Slowest:508918, Avg:399953.321100

That makes it a 50.02% improvement in the best case and 50%
improvement in the average case.  In other words, pow runs twice as
fast with this patch! (that's not a factorial ;)

OK to commit?

Siddhesh

[1] http://sourceware.org/ml/libc-alpha/2013-01/msg00617.html


	* sysdeps/ieee754/dbl-64/mpexp.c (__mpexp): Faster polynomial
	evaluation.


diff --git a/sysdeps/ieee754/dbl-64/mpexp.c b/sysdeps/ieee754/dbl-64/mpexp.c
index 8d288ff..35c4e80 100644
--- a/sysdeps/ieee754/dbl-64/mpexp.c
+++ b/sysdeps/ieee754/dbl-64/mpexp.c
@@ -49,6 +49,15 @@ __mpexp (mp_no *x, mp_no *y, int p)
       0, 0, 0, 0, 3, 3, 4, 4, 5, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6,
       6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8
     };
+
+  /* Factorials for the values of np above.  */
+  static const double nfa[33] =
+    {
+      1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 24.0, 24.0, 120.0, 24.0, 24.0,
+      120.0, 120.0, 120.0, 720.0, 720.0, 720.0, 720.0, 720.0, 720.0,
+      720.0, 720.0, 720.0, 720.0, 5040.0, 5040.0, 5040.0, 5040.0,
+      40320.0, 40320.0, 40320.0, 40320.0, 40320.0
+    };
   static const int m1p[33] =
     {
       0, 0, 0, 0,
@@ -71,16 +80,7 @@ __mpexp (mp_no *x, mp_no *y, int p)
       {0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 39, 43, 47, 51, 55, 59, 63},
       {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 43, 47, 50, 54}
     };
-  mp_no mpk =
-    {
-      0,
-      {
-	0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
-	0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
-	0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
-      }
-    };
-  mp_no mps, mpak, mpt1, mpt2;
+  mp_no mps, mpk, mpt1, mpt2;
 
   /* Choose m,n and compute a=2**(-m).  */
   n = np[p];
@@ -115,24 +115,38 @@ __mpexp (mp_no *x, mp_no *y, int p)
 	  break;
     }
 
-  /* Compute s=x*2**(-m). Put result in mps.  */
+  /* Compute s=x*2**(-m). Put result in mps.  This is the range-reduced input
+     that we will use to compute e^s.  For the final result, simply raise it
+     to 2^m.  */
   __pow_mp (-m, &mpt1, p);
   __mul (x, &mpt1, &mps, p);
 
-  /* Evaluate the polynomial. Put result in mpt2.  */
-  mpk.e = 1;
-  mpk.d[0] = ONE;
-  mpk.d[1] = n;
-  __dvd (&mps, &mpk, &mpt1, p);
-  __add (&mpone, &mpt1, &mpak, p);
-  for (k = n - 1; k > 1; k--)
+  /* Compute the Taylor series for e^s:
+
+         1 + x/1! + x^2/2! + x^3/3! ...
+
+     for N iterations.  We compute this as:
+
+         e^x = 1 + (x * n!/1! + x^2 * n!/2! + x^3 * n!/3!) / n!
+             = 1 + (x * (n!/1! + x * (n!/2! + x * (n!/3! + x ...)))) / n!
+
+     n! is pre-computed and saved while k! is computed on the fly.  */
+  __cpy (&mps, &mpt2, p);
+
+  double kf = 1.0;
+
+  /* Evaluate the rest.  The result will be in mpt2.  */
+  for (k = n - 1; k > 0; k--)
     {
-      __mul (&mps, &mpak, &mpt1, p);
-      mpk.d[1] = k;
-      __dvd (&mpt1, &mpk, &mpt2, p);
-      __add (&mpone, &mpt2, &mpak, p);
+      /* n! / k! = n * (n - 1) ... * (n - k + 1) */
+      kf *= k + 1;
+
+      __dbl_mp (kf, &mpk, p);
+      __add (&mpt2, &mpk, &mpt1, p);
+      __mul (&mps, &mpt1, &mpt2, p);
     }
-  __mul (&mps, &mpak, &mpt1, p);
+  __dbl_mp (nfa[p], &mpk, p);
+  __dvd (&mpt2, &mpk, &mpt1, p);
   __add (&mpone, &mpt1, &mpt2, p);
 
   /* Raise polynomial value to the power of 2**m. Put result in y.  */

Follow-Ups:
- Re: [Patch] Faster __mpexp polynomial evaluation
  - From: Andreas Jaeger

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]