This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: using iconv for conversion from/to Unicode

To: linux-utf8 at nl dot linux dot org
Subject: Re: using iconv for conversion from/to Unicode
From: Bruno Haible <haible at ilog dot fr>
Date: Wed, 15 Mar 2000 14:53:57 +0100 (MET)
Cc: libc-alpha at sourceware dot cygnus dot com
References: <200003141351.OAA00571@jaures.ilog.fr><E12VAlL-0001bp-00@wisbech.cl.cam.ac.uk>

Markus Kuhn writes:

> If the implementation can handle surrogates, then better use "UTF-16BE",
> "UTF-16LE", "UTF-16" instead.

This is not useful for the programs I am talking about. The internal
representation I would like to see better supported is the one where each
Unicode character occupies exactly one element of an array: uint16_t[] and
uint32_t[].

>   "UCS-2-INTERNAL" ->  "UCS-2"

"UCS-2" has ambiguous endianness and sometimes also a BOM. Both of these
misfeatures make it unsuitable as a name for uint16_t[].

>   "UNICODEBIG"     ->  "UCS-2BE"
>   "UNICODELITTLE"  ->  "UCS-2LE"

That and the same for UCS-4 would be better than nothing. Ulrich, can you
add aliases "UCS-2BE", "UCS-2LE", "UCS-4BE" (= "UCS-4"), and implement
"UCS-4LE" ?

I'm not in favour of "UTF32-BE" and "UTF32-LE", because unicode.org wants
them to reject characters > 0x10FFFF, and some day even 0x110000 characters
may not be enough.

Bruno

References:
- using iconv for conversion from/to Unicode
  - From: Bruno Haible
- Re: using iconv for conversion from/to Unicode
  - From: Markus Kuhn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]