This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: New 64 bit wcscmp implementation


On Mon, 29 Aug 2011, Dmitrieva Liubov wrote:

> Hello,
> 
> I've optimized wcscmp function (new implementation is written with
> assembly language coding), it looks better than current version for
> all machines with SSE2 supporting I have.
> It improves performance by up to 3.6X.

Do your versions avoid the bug in the present version that the comparisons 
are done in an unsigned type even though when wchar_t is signed they must 
be done in the signed wchar_t type?

A testcase for this bug for wcscmp is:

#include <stdlib.h>
#include <wchar.h>

wchar_t w0[] = { WCHAR_MIN, 0 };
wchar_t w1[] = { WCHAR_MAX, 0 };

int
main (void)
{
  if (wcscmp (w0, w1) < 0)
    return 0;
  else
    abort ();
}

and for wmemcmp:

#include <stdlib.h>
#include <wchar.h>

wchar_t w0 = WCHAR_MIN;
wchar_t w1 = WCHAR_MAX;

int
main (void)
{
  if (wmemcmp (&w0, &w1, 1) < 0)
    return 0;
  else
    abort ();
}

Both tests fail at present.  The relevant standard wording is "Unless 
explicitly stated otherwise, the functions described in this subclause 
order two wide characters the same way as two integers of the underlying 
integer type designated by wchar_t.".  And, since there might be some 
question over validity of the negative values of wchar_t, C1X adds an 
explicit statement "Arguments to the functions in this subclause may point 
to arrays containing wchar_t values that do not correspond to members of 
the extended character set. Such values shall be processed according to 
the specified semantics, except that it is unspecified whether an encoding 
error occurs if such a value appears in the format string for a function 
in 7.29.2 or 7.29.5 and the specified semantics do not require that value 
to be processed by wcrtomb." to make it clear that like the narrow string 
and memory functions all wchar_t values are valid without regard to their 
semantics in any character set.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]