This is the mail archive of the cygwin mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Apr 1 10:01, Warren Young wrote: > On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote: > > > > As you probably know, Unicode values beyond the base plane (that is, > > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation) > > are represented as so-called surrogate pairs in UTF-16, two UTF-16 > > values in the 0xd800 - 0xdfff range. > > I happened to have run across a similar strangeness in Unicode earlier > today. Does Cygwin cope with/care about Unicode normalization forms? Not at all. UTF-8 string in, equivalent UTF-16 string out and vice versa, on the bit level. Additionally there's a replacement for UTF-16 values which can't be handled by the current (non-UTF-8) codeset, e.g. ISO8859-1: ASCII CAN followed by the UTF-8 representation of the UTF-16 character. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
Attachment:
pgptAgGrDbJBB.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |