This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug in mbsrtowcs?


Corinna Vinschen wrote:
Hi,

while I'm looking into implementing the new SUSv4 functions wcsnrtombs
and mbsnrtowcs, I started puzzeling over a strange piece of code in
mbsrtowcs:

  while (n > 0)
    {
      bytes = _mbrtowc_r (r, ptr, *src, nms, ps);
      [...]
      else if (bytes == -2)
        *src += MB_CUR_MAX;
      else [...]
    }

So, if the byte sequence starting at *src is an incomplete multibyte
char, *src is skipped by MB_CUR_MAX and the loop continues.

Hang on.  If _mbrtowc_r encounters an incomplete MB char then it does
not form an invalid character so there's no reason to return with -1 and
set errno to EILSEQ.  However, it also doesn't form a *valid* character,
it's just incomplete.  Thus it must be the start of the last character
at the end of the input string.

This code is there because it means that the character has redundant shift state. From mbrtowc:

(*size_t*)-2
   If the next /n/ bytes contribute to an incomplete but potentially
   valid character, and all /n/ bytes have been processed (no value is
   stored). When /n/ has at least the value of the {MB_CUR_MAX} macro,
   this case can only occur if /s/ points at a sequence of redundant
   shift sequences (for implementations with state-dependent encodings).

In our case, n is MB_CUR_MAX so it must be redundant shift sequence. The state is stored so if we increase the src pointer, it should continue where it left off.

-- Jeff J.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]