This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: using iconv for conversion from/to Unicode


Markus Kuhn writes:

> If the implementation can handle surrogates, then better use "UTF-16BE",
> "UTF-16LE", "UTF-16" instead.

This is not useful for the programs I am talking about. The internal
representation I would like to see better supported is the one where each
Unicode character occupies exactly one element of an array: uint16_t[] and
uint32_t[].

>   "UCS-2-INTERNAL" ->  "UCS-2"

"UCS-2" has ambiguous endianness and sometimes also a BOM. Both of these
misfeatures make it unsuitable as a name for uint16_t[].

>   "UNICODEBIG"     ->  "UCS-2BE"
>   "UNICODELITTLE"  ->  "UCS-2LE"

That and the same for UCS-4 would be better than nothing. Ulrich, can you
add aliases "UCS-2BE", "UCS-2LE", "UCS-4BE" (= "UCS-4"), and implement
"UCS-4LE" ?

I'm not in favour of "UTF32-BE" and "UTF32-LE", because unicode.org wants
them to reject characters > 0x10FFFF, and some day even 0x110000 characters
may not be enough.

Bruno

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]