This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

bug in mbrtowc?


I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
Here's an example:

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>

int main(void) {
  wchar_t wc;
  size_t ret;
  mbstate_t s = { 0 };
  puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
  printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
  printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
  printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
  printf("%x\n", wc);
  return 0;
}

The sequence E2 94 84 should translate to U+2514. Instead, the second
and third calls to mbrtowc report encoding errors. It does work
correctly if the three bytes are passed to mbrtowc() in one go:

  printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]