This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] "Towers of Hanoi" Mergesort Implementation




This patch replaces the existing version of top-down mergesort
in stdlib/msort.c (by Mike Haertel) with an improved implementation.
This algorithm is used in glibc to implement "qsort" routine instead
of a quicksort in the typical case when there is enough memory
available to provide a temporary array.  This modified algorithm,
"Towers of Hanoi" mergesort, performs exactly the same number of
comparisons, but significantly fewer copies of the array elements.
Togther with the other improvements in the patch below this leads
to a 15-25% improvement in run-time on my questionable benchmarks
under i686.


The underlying principle is that "classical" recursive top-down mergesort
(and as implemented in glibc), recursively sorts both halves of the input
array in-place, merges the these two sorted subarrays into a temporary
buffer and then copies the resulting merged array back to the original.
It is this copying of the entire array, from the temporary array back to
the input array at each level of the recursion that is the source of its
inefficiency compared to a bottom-up mergesort.

The attached "Towers of Hanoi" implementation instead avoids these
O(N.logN) copies, by alternately sorting from the input buffer to the
temporary buffer, and from the temporary buffer to the input buffer
at each level in the recursion.  Each recursive call returns a boolean
value indicating whether the given array was sorted in-place or was
returned in the temporary array.  At the very top-level the original
behavior may be preserved by simply calling "memcpy" to copy the array
once (O(N) copies) from the temporary array to the input array if
required.  This alternation of which buffers are the source and
destination of the merge is reminiscent of the "temporary post" in
the recursive solutions to the classical Towers of Hanoi problem, hence
this variant's name.

Given the algorithm framework above, Towers of Hanoi mergesort can
make use of several further optimizations.  Arrays of length one
are already sorted in-place.  Arrays of length two, may be left
in-place if the two elements are already ordered, and only need be
copied to the temporary array if they are out of order.

If the two sub-arrays were both sorted in-place, or both copied to
their corresponding temporaries, the two halves are merged into the
free array.  If the locations of the sorted sub-arrays are split,
Towers of Hanoi takes advantage of the fact that these halves may
be merged into the buffer containing the second (upper) half.

These techniques allow Towers of Hanoi mergesort to achieve both
the "number of comparisons" performance of top-down mergesort with
the "number of copies" performance of bottom-up mergesort.


One possibly controversial point of the implementation below, is that
it provides three duplicated implementations of hanoi_sort, the first
completely generic using memcpy to move each array element, the remaining
two specialized for the cases where the element size is identical to
that of an int or a long.  These two "template instantiations" can avoid
passing the size of the element upon each recursive call and can use
simple assignment of C primitives to perform the copying of elements.
This specialization is designed to handle the common case usage of
qsort for sorting arrays of pointers and/or integers.  The original
glibc implementation performed a similar optimization, but by testing
it within a single recursive function, it ended up checking the size
of each element and the array's alignment O(N) times during each sort.


I'm more than happy to provide more details of the implementation, the
mathematical analyses and benchmark results to the patch reviewers, if
that'd help the approval process.


Tested within glibc by "make" and "make check" on i686-pc-linux-gnu,
and by extensive testing and benchmarking on IRIX, Solaris, Windows,
Compaq Tru64 and AIX.

Is this patch suitable for glibc?



Roger Sayle [Ph.D. Computer Science, University of Edinburgh, 1994].


2002-01-16  Roger Sayle <roger@eyesopen.com>
	* stdlib/msort.c (msort_with_tmp): Replace implementation with
	more efficient "Towers of Hanoi" mergesort.
	(hanoi_sort, hanoi_sort_int, hanoi_sort_long): New functions,
	for generic, sizeof(int) and sizeof(long) variants respectively.


*** libc/stdlib/msort.c	Thu Jul  5 22:55:41 2001
--- patch/stdlib/msort.c	Wed Jan 16 16:46:09 2002
***************
*** 1,7 ****
  /* An alternative to qsort, with an identical interface.
     This file is part of the GNU C Library.
!    Copyright (C) 1992, 1995-1997, 1999, 2000, 2001 Free Software Foundation, Inc.
!    Written by Mike Haertel, September 1988.

     The GNU C Library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Lesser General Public
--- 1,9 ----
  /* An alternative to qsort, with an identical interface.
     This file is part of the GNU C Library.
!    Copyright (C) 1992, 1995-1997, 1999, 2000, 2001, 2002
!    Free Software Foundation, Inc.
!    Original Implementation by Mike Haertel, September 1988.
!    Towers of Hanoi Mergesort by Roger Sayle, January 2002.

     The GNU C Library is free software; you can redistribute it and/or
     modify it under the terms of the GNU Lesser General Public
***************
*** 25,88 ****
  #include <memcopy.h>
  #include <errno.h>

  static void msort_with_tmp (void *b, size_t n, size_t s,
! 			    __compar_fn_t cmp, char *t);

! static void
! msort_with_tmp (void *b, size_t n, size_t s, __compar_fn_t cmp,
! 		char *t)
  {
!   char *tmp;
!   char *b1, *b2;
!   size_t n1, n2;
!
!   if (n <= 1)
!     return;
!
!   n1 = n / 2;
!   n2 = n - n1;
!   b1 = b;
!   b2 = (char *) b + (n1 * s);
!
!   msort_with_tmp (b1, n1, s, cmp, t);
!   msort_with_tmp (b2, n2, s, cmp, t);
!
!   tmp = t;
!
!   if (s == OPSIZ && (b1 - (char *) 0) % OPSIZ == 0)
!     /* We are operating on aligned words.  Use direct word stores.  */
!     while (n1 > 0 && n2 > 0)
!       {
! 	if ((*cmp) (b1, b2) <= 0)
! 	  {
! 	    --n1;
! 	    *((op_t *) tmp)++ = *((op_t *) b1)++;
! 	  }
! 	else
! 	  {
! 	    --n2;
! 	    *((op_t *) tmp)++ = *((op_t *) b2)++;
! 	  }
        }
!   else
!     while (n1 > 0 && n2 > 0)
        {
! 	if ((*cmp) (b1, b2) <= 0)
! 	  {
! 	    tmp = (char *) __mempcpy (tmp, b1, s);
! 	    b1 += s;
! 	    --n1;
! 	  }
! 	else
! 	  {
! 	    tmp = (char *) __mempcpy (tmp, b2, s);
! 	    b2 += s;
! 	    --n2;
! 	  }
!       }
!   if (n1 > 0)
!     memcpy (tmp, b1, n1 * s);
!   memcpy (b, t, (n - n2) * s);
  }

  void
--- 27,387 ----
  #include <memcopy.h>
  #include <errno.h>

+
+ /* Check whether pointer P is aligned for access by type T. */
+ #define TYPE_ALIGNED(P,T)  !(((char*)(P) - (char*)0) % sizeof (T))
+
+
+ static int hanoi_sort (char *b, size_t n, size_t s,
+                         __compar_fn_t cmp, char *t);
+ static int hanoi_sort_int (int *b, size_t n,
+                            __compar_fn_t cmp, int *t);
+ static int hanoi_sort_long (long *b, size_t n,
+                             __compar_fn_t cmp, long *t);
  static void msort_with_tmp (void *b, size_t n, size_t s,
! 			    __compar_fn_t cmp, void *t);

!
! /* This routine implements "Towers of Hanoi Mergesort".  The algorithm
!    sorts the n elements of size s pointed to by array b using comparison
!    function cmp.  The argument t points to a suitable temporary buffer.
!    If the return value is zero, the sorted array is returned in b, and
!    for non-zero return values the sorted array is returned in t.  */
!
! static int
! hanoi_sort (char *b, size_t n, size_t s, __compar_fn_t cmp, char *t)
  {
!     register size_t n1, n2;
!     register char *b1,*b2;
!     register char *t1,*t2;
!     register char *s1,*s2;
!     register size_t size;
!     register int result;
!     register char *ptr;
!
!     if (n <= 1)
!       return 0;
!
!     if (n == 2)
!       {
!         b2 = b + s;
!         if ((*cmp) (b, b2) <= 0)
!           return 0;
!         ptr = (char *) __mempcpy (t, b2, s);
!         memcpy (ptr, b, s);
!         return 1;
!      }
!
!     n1 = n/2;
!     n2 = n - n1;
!     /* n1 < n2!  */
!
!     size = n1 * s;
!     b1 = b;
!     b2 = b + size;
!
!     t1 = t;
!     t2 = t + size;
!
!     /* Recursively call hanoi_sort to sort the two halves of the array.
!        Depending upon the return values, determine the values s1 and s2
!        the locations of the two sorted subarrays, ptr, the location to
!        contain the sorted array and result, the return value for this
!        function.  Note that "ptr = result? t : b".  */
!
!     if (hanoi_sort (b1, n1, s, cmp, t1))
!       {
!         if (hanoi_sort (b2, n2, s, cmp, t2))
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = b2;
!           }
        }
!     else
!       {
!         if (hanoi_sort (b2, n2, s, cmp, t2))
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = b2;
!           }
!       }
!
!     /*  Merge the two sorted arrays s1 and s2 of n1 and n2 elements
!         respectively, placing the result in ptr.  On entry, n1 > 0
!         && n2 > 0, and with each iteration either n1 or n2 is decreased
!         until either reaches zero, and the loop terminates via return.  */
!
!     for (;;)
!       {
!         if ((*cmp) (s1, s2) <= 0)
!           {
!             ptr = (char *) __mempcpy (ptr, s1, s);
!             s1 += s;
!             --n1;
!             if (n1 == 0)
!             {
!               if (ptr != s2)
!                 memcpy (ptr, s2, n2 * s);
!               return result;
!             }
!           }
!         else
!           {
!             ptr = (char *) __mempcpy (ptr, s2, s);
!             s2 += s;
!             --n2;
!             if (n2 == 0)
!               {
!                 memcpy (ptr, s1, n1 * s);
!                 return result;
!               }
!         }
!     }
! }
!
! /* This routine is a variant of hanoi_sort that is optimized for the
!    case where items to be sorted are the size of ints, and both b and
!    t are suitably aligned.  The parameter s in not needed as it is
!    known to be sizeof(int).  */
!
! static int
! hanoi_sort_int (int *b, size_t n, __compar_fn_t cmp, int *t)
! {
!     register size_t n1, n2;
!     register int *b1,*b2;
!     register int *t1,*t2;
!     register int *s1,*s2;
!     register int result;
!     register int *ptr;
!
!     if (n <= 1)
!       return 0;
!
!     if (n == 2)
!       {
!         if ((*cmp) (b, b + 1) <= 0)
!           return 0;
!         t[0] = b[1];
!         t[1] = b[0];
!         return 1;
!      }
!
!     n1 = n/2;
!     n2 = n - n1;
!     /* n1 < n2!  */
!
!     b1 = b;
!     b2 = b + n1;
!
!     t1 = t;
!     t2 = t + n1;
!
!     /* Recursively call hanoi_sort_int to sort the two halves.  */
!     if (hanoi_sort_int (b1, n1, cmp, t1))
!       {
!         if (hanoi_sort_int (b2, n2, cmp, t2))
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = b2;
!           }
!       }
!     else
!       {
!         if (hanoi_sort_int (b2, n2, cmp, t2))
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = b2;
!           }
!       }
!
!     /*  Merge n1 elements from s1 and n2 elements from s2 into ptr.  */
!     for (;;)
        {
!         if ((*cmp) (s1, s2) <= 0)
!           {
!             *ptr++ = *s1++;
!             --n1;
!             if (n1 == 0)
!             {
!               if (ptr != s2)
!                 memcpy (ptr, s2, n2 * sizeof (int));
!               return result;
!             }
!           }
!         else
!           {
!             *ptr++ = *s2++;
!             --n2;
!             if (n2 == 0)
!               {
!                 memcpy (ptr, s1, n1 * sizeof (int));
!                 return result;
!               }
!         }
!     }
! }
!
! /* This routine is a variant of hanoi_sort that is optimized for the
!    case where items to be sorted are the size of longs, and both b and
!    t are suitably aligned.  The parameter s in not needed as it is
!    known to be sizeof(long).  */
!
! static int
! hanoi_sort_long (long *b, size_t n, __compar_fn_t cmp, long *t)
! {
!     register size_t n1, n2;
!     register long *b1,*b2;
!     register long *t1,*t2;
!     register long *s1,*s2;
!     register int result;
!     register long *ptr;
!
!     if (n <= 1)
!       return 0;
!
!     if (n == 2)
!       {
!         if ((*cmp) (b, b + 1) <= 0)
!           return 0;
!         t[0] = b[1];
!         t[1] = b[0];
!         return 1;
!      }
!
!     n1 = n/2;
!     n2 = n - n1;
!     /* n1 < n2!  */
!
!     b1 = b;
!     b2 = b + n1;
!
!     t1 = t;
!     t2 = t + n1;
!
!     /* Recursively call hanoi_sort_long to sort the two halves.  */
!     if (hanoi_sort_long (b1, n1, cmp, t1))
!       {
!         if (hanoi_sort_long (b2, n2, cmp, t2))
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 0;
!             ptr = b;
!             s1 = t1;
!             s2 = b2;
!           }
!       }
!     else
!       {
!         if (hanoi_sort_long (b2, n2, cmp, t2))
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = t2;
!           }
!         else
!           {
!             result = 1;
!             ptr = t;
!             s1 = b1;
!             s2 = b2;
!           }
!       }
!
!     /*  Merge n1 elements from s1 and n2 elements from s2 into ptr.  */
!     for (;;)
!       {
!         if ((*cmp) (s1, s2) <= 0)
!           {
!             *ptr++ = *s1++;
!             --n1;
!             if (n1 == 0)
!             {
!               if (ptr != s2)
!                 memcpy (ptr, s2, n2 * sizeof (long));
!               return result;
!             }
!           }
!         else
!           {
!             *ptr++ = *s2++;
!             --n2;
!             if (n2 == 0)
!               {
!                 memcpy (ptr, s1, n1 * sizeof (long));
!                 return result;
!               }
!         }
!     }
! }
!
! /* This routine preserves the original interface to msort_with_tmp and
!    determines which variant of hanoi_sort to call, based upon item size
!    and alignment.  */
!
! static void
! msort_with_tmp (void *b, size_t n, size_t s, __compar_fn_t cmp, void *t)
! {
!   const size_t size = n * s;
!
!   if (s == sizeof(int) && TYPE_ALIGNED (b, int))
!     {
!       if (hanoi_sort_int (b, n, cmp, t))
!         memcpy (b, t, size);
!     }
!   else if (s == sizeof(long) && TYPE_ALIGNED (b, long))
!     {
!       if (hanoi_sort_long (b, n, cmp, t))
!         memcpy (b, t, size);
!     }
!   else
!     {
!       /* Call the generic implementation.  */
!       if (hanoi_sort (b, n, s, cmp, t))
!         memcpy (b, t, size);
!     }
  }

  void
*************** qsort (void *b, size_t n, size_t s, __co
*** 130,136 ****
  	   measured in bytes.  */

        /* If the memory requirements are too high don't allocate memory.  */
!       if (size / pagesize > phys_pages)
  	_quicksort (b, n, s, cmp);
        else
  	{
--- 429,435 ----
  	   measured in bytes.  */

        /* If the memory requirements are too high don't allocate memory.  */
!       if ((long int) (size / pagesize) > phys_pages)
  	_quicksort (b, n, s, cmp);
        else
  	{


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-438-3470


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]