This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: bug in mbrtowc?
On Jul 27 22:56, Andy Koppe wrote:
> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
> Here's an example:
>
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> #include <wchar.h>
>
> int main(void) {
> wchar_t wc;
> size_t ret;
> mbstate_t s = { 0 };
> puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
> printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
> printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
> printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
> printf("%x\n", wc);
> return 0;
> }
>
> The sequence E2 94 84 should translate to U+2514. Instead, the second
> and third calls to mbrtowc report encoding errors. It does work
> correctly if the three bytes are passed to mbrtowc() in one go:
>
> printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
That's a bug in the newlib function __utf8_mbtowc. I'm really surprised
that this bug has never been reported before since it's in the code for
years, probably since it has been introduced in 2002.
I'll follow up on the newlib list.
Thanks for the report and especially thanks for the testcase,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple