This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH/RFA] Internationalize ctype functionality


Corinna Vinschen wrote:
...
> On Mar 26 21:55, Howland Craig D (Craig) wrote:
>- Use it.  The wide char conversion is lossless.  If the conversion
>  works and the result is a singlebyte char, it's used, otherwise c is
>  returned.  I don't see a problem with this approach.

The potential problem with the approach is inconsistency.  But that has
been solved by adding the MB_CUR_LEN == 1 check.

>>   That is, should it be gated by a check that MB_CUR_MAX == 1?
>
>I'm not quite sure.  While POSIX state that the incoming int must be
>representable as an unsigned char, it doesn't explicitely state that
>this unsigned char must be from a singlebyte charset.
>
>OTOH, all the other isalpha/isprint/etc functions only work for
>singlebyte chars anyway.  And if we start using transition tables
>at one point...

If the character is not from a single-byte charset, then the user should
be calling towupper() and towlower(), not the plain ones.  As you said,
the only reason for calling the wides from the plains is a quick
band-aid
to work for [non-ASCII] single-byte charsets.

OTOH, does it make sense to only do tolower and toupper but not the
rest of the others at the same time?  (Should these tolower and toupper
changes be tabled until later?)

Reflecting some more on the subject, I had an additional thought.  Since
the defined intent is "Reimplement in _MB_CAPABLE case to support any
singlebyte charset," it really is not multibyte.  (It's more than the
basic character set, but it's not multibyte.  In fact, it cannot work
with a multibyte sequence.)  This raises some questions.  Is it
necessary--or even appropriate--to call mbtowc() and wctomb()?
That is, will either of them change the 1-byte value into a different
1-byte value?  (toupper&tolower can only ever be called with at
most a single-byte charset character.  It is impossible for them to
work with multi-byte characters.)
Couldn't then the value just be given straight to towlower() and the
return therefrom used directly?  (It would be much more efficient.)
For example,

   else if (c != EOF && MB_CUR_MAX == 1)
-    {
-      char s[MB_LEN_MAX] = { c, '\0' };
-      wchar_t wc;
-      if (mbtowc (&wc, s, 1) >= 0
-	  && wctomb (s, (wchar_t) towupper ((wint_t) wc)) == 1)
-       c = s[0];
+      c = (unsigned char) towupper ((wint_t) (unsigned char) c);
-    }
   return c;

I had also thought of the EOF problem, so I'm glad that you caught it.

Craig


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]