This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)


2009/9/27 Corinna Vinschen:
>> > What about this: ÂThe private use area U+f0xx is already used for ASCII
>> > chars invalid in Windows filenames. ÂThe same range can be used for
>> > invalid chars > 0x80. ÂThis could happen unconditionally.
>>
>> That's a great idea, allowing both lone surrogate support and Unix
>> filename transparency.
>>
>> [time passes]
>>
>> Nope, can't think of anything wrong with it. :)
>
> Did we get it? ÂDid we actually get it?

Not quite. :(

If the Unix filename contains the UTF-8 representation of U+F0xx, that
will now roundtrip to just the xx byte. U+F000 is particularly
problematic, as that roundtrips to a null byte.

Solution: if f_mbtowc comes back with a U+F0xx, scratch that, and
instead turn each of the original bytes into a U+F0xx, i.e.:

\xEF\x80\x80 -> U+F0EF U+F080 U+F080

One for later?


> I have a local implementation. for the entire thing,
>
> - Ctrl-X instead of Ctrl-N
> - invalid \xXX bytes -> U+ffXX
> - Allow CESU-8 sequences for lone surrogate halves
> - Change documentation accordingly.

Wow, that was quick!


> If you want to play with it, the entire patch is here (missing a ChangeLog
> for now):
>
> Â http://cygwin.de/hopefully-last-big-cygwin-locale-patch.diff

Compile problem:

cc1plus: warnings being treated as errors
../../.././winsup/cygwin/syscalls.cc: In function âchar*
setlocale(int, const char*)â:
../../.././winsup/cygwin/syscalls.cc:4186: error: âw_cwdâ may be used
uninitialized in this function
../../.././winsup/cygwin/syscalls.cc:4186: error: âw_pathâ may be used
uninitialized in this function

Looks like a false alarm though, and a pair of "=0"s made it compile.

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]