This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: With bad UTF-8, cygwin can create files it can't read


On Apr  1 10:01, Warren Young wrote:
> On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote:
> > 
> > As you probably know, Unicode values beyond the base plane (that is,
> > everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> > are represented as so-called surrogate pairs in UTF-16, two UTF-16
> > values in the 0xd800 - 0xdfff range.
> 
> I happened to have run across a similar strangeness in Unicode earlier
> today.  Does Cygwin cope with/care about Unicode normalization forms?

Not at all.  UTF-8 string in, equivalent UTF-16 string out and vice versa,
on the bit level.  Additionally there's a replacement for UTF-16 values
which can't be handled by the current (non-UTF-8) codeset, e.g. ISO8859-1:
ASCII CAN followed by the UTF-8 representation of the UTF-16 character.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgptAgGrDbJBB.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]