This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

strtoul(-1) discussion


Hello,

I might have found a defect of glibc's strtoul with regards to POSIX.
I have done that long ago, but hesitated to post it until now.
Please see below what I have found, and a test program in attachment.

Results of the experiment on glibc:
===================================

strtol: overflows in a sensible manner, i.e. outside its output range
[-2^31 ; 2^31-1].

strtoul: overflows when outside the range [-(2^32-1) ; (2^32-1)] even
though its output range is [0 ; (2^32-1)].

When N is inside strtoul's no-overflow range, we have:

  strtoul (-N) = - strtoul (N)

which seems correct according to some ISO C specs [8].  However, the
result is truncated to the 32 rightmost bits since it is an ulong.

The algorithm behind GNU libc's strtoul seems to be:

  * store the sign independently from the digits.
  * perform a strtoul on the digits without the sign, producing ulong
    value N.
  * if the sign is negative, perform N := -N.
  * return N.

The problem is that [1] says:

  "If the correct value is outside the range of representable values,
  {ULONG_MAX} or {ULLONG_MAX} shall be returned and errno set to
  [ERANGE]."

If my interpretation is correct, this means that strtoul should not
fail when given "-0" but should raise ERANGE for all strictly negative
values ("-1" and below), which it currently does not in the GNU
libc.

[As a side note, the specs in [1] mean that the spirit for strtol and
strtoul are different concerning the following matter:

  When the string represents an integer below strtol's range, strtol
  returns the lowest representable integer, i.e. LONG_MIN.

  When the string represents an integer below strtoul's range, strtoul
  returns ULONG_MAX, instead of the lowest representable integer
  (which is of course 0 for ulong).

  For that matter, we might want to add a function to the GNU libc
  similar to ISO C strtoul but which has a more sensible behaviour
  (return 0 when provided negative numbers).]

The Solaris 9 man page [2] says the same as [1]: strtoul should raise
ERANGE and return ULONG_MAX "If the correct value is outside the range
of representable values".

The HP-UX man page [3] says that strtoul should raise ERANGE and
return ULONG_MAX "If the correct value would cause overflow".

The AIX man page [4] says that strtoul should return ULONG_MAX "If the
correct value is outside the range of representable values", but seems
a bit unclear to me concerning the return value in this case.

The FreeBSD [5], NetBSD [6] and OpenBSD [7] man pages describe a
behaviour similar to that of GNU libc: strtoul raises ERANGE (and
return ULONG_MAX) only if "the original (non-negated) value would
overflow".

The IRIX man page [9] is not quite clear about this; it says that the
ERANGE error is raised "If the value represented by STR would cause
overflow".

The eCos test suite [10] requires that when given string "-479",
strtoul should return (unsigned long) -479.

In short, there are two main specifications of the behaviour of
strtoul: the GNU/*BSD side and the standard/Solaris/HPUX side.  Except
that the GNU specifications seem less clear than the *BSD ones.

I have made some experiments on Solaris, and the Solaris
implementation does not follow its specification.  The behaviour of
Solaris's strtoul seems to be the same as GNU libc's strtoul!

Can somebody check the AIX behaviour?

If you decide to change the GNU libc so that it conforms more to ISO C
([1] and [8]), then I believe something along the lines of the
following code should be added to strtol_l.c (untested though):

#if UNSIGNED
  /* Check cases where the value falls outside the range of unsigned
     LONG int even though i is within the range of unsigned LONG
     int  */
  if (i > 0 && negative)
    overflow = 1;
#endif

Of course there is the possibility that it break some programs,
even if it is a quite special case.

The set of programs where this change would make any difference is
probably only a subset of the programs that have never been ported
from GNU to Solaris/AIX/HPUX/etc.

In any case, the texinfo manual for strtoul is currently at least
ambiguous to me.  It should also mention the fact that strtoul is not
portable unless the input string is known to represent a nonnegative
number.

[1] http://www.opengroup.org/onlinepubs/000095399/functions/strtoul.html
[2] http://docs.sun.com/app/docs/doc/816-0213/6m6ne38d3?a=view
[3] http://www.informatik.uni-frankfurt.de/doc/man/hpux/strtoul.3c.html
[4] http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf2/strtol.htm
[5] http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=strtoul
[6] http://www.daemon-systems.org/man/strtoul.3.html
[7] http://www.openbsd.org/cgi-bin/man.cgi?query=strtoul&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html
[8] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf (section
7.20.1.4 page ~310)
[9] http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?cmd=getdoc&coll=0650&db=man&fname=3%20strtoul
[10] http://opencores.org/ocsvn/openrisc/openrisc/trunk/rtos/ecos-2.0/packages/language/c/libc/stdlib/v2_0/tests/strtoul.c

Thanks!

Best regards
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>             /* strtol().  */
#include <string.h>             /* strerror().  */
#include <ctype.h>              /* isdigit().  */

static void
test_strtoul (char *str)
{
  char *pure;

  /* Copy str to pure while removing unwanted characters.  */
  {
    pure = strdup (str);

    {
      char *src = str;
      char *dst = pure;
      while (*src)
        {
          if (isdigit (*src) || *src == '-')
            *dst++ = *src;
          src++;
        }
      *dst = 0;
    }
  }

  errno = 0;
  unsigned long u = strtoul (pure, NULL, 10);
  int errno_ul = errno;

  errno = 0;
  long l = strtol (pure, NULL, 10);
  int errno_l = errno;

  printf ("str={%14s} => strtoul = %11lu = %11ld = %#10lx (errno: %2d)\n", str, u, u, u, errno_ul);
  printf ("                        strtol  = %11lu = %11ld = %#10lx (errno: %2d)\n", l, l, l, errno_l);
  printf ("\n");
}

int
main (void)
{
  if (sizeof (long) != 4)
    {
      printf ("error: this test applies for architectures where long is 32-bit\n");
      printf ("error: on the current architecture, long is %zu bytes long\n", sizeof (long));
      abort ();
    }

  /*
   * Reminder:
   * 2^32 = 4'294'967'296
   * 2^31 = 2'147'483'648
   */

  printf ("Reminder: ERANGE = %d\n", ERANGE);
  printf ("     string format                ulong format  long format   hexa format\n");

  /* ulong overflows by one.  */
  test_strtoul ("4'294'967'296");

  /* ulong does not overflow, long does.  */
  test_strtoul ("4'294'967'295");

  /* -1 is 0xffffffff.  strtoul alone does not permit to distinguish
      4'294'967'295 from -1: additional work would be necessary.  */
  test_strtoul ("-1");

  /* This would fit in a long, not in a ulong, but strtoul silently
     converts the long to ulong (to 2'147'483'647) as per its
     specs.  */
  test_strtoul ("-2'147'483'648");

  /* This does not fit in a long.  */
  test_strtoul ("-2'147'483'649");

  test_strtoul ("-4'294'967'296");

  test_strtoul ("-4'294'967'295");

  test_strtoul ("-4'294'967'294");

  return 0;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]