This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unicode 3.2 support (4)


Anthony Fok writes:
>   1. GB18030 is intended to be an UTF: Just as UTF-8 is ASCII-compatible
>      and can map to all unassigned-yet-legal codepoints in Unicode, so is
>      GB18030 GB2312/GBK-compatible and can map to all unassigned-yet-legal
>      codepoints in Unicode.

Currently (i.e. with or without yesterday's proposed patch), the
GB18030 converter has the following problems:

a) It doesn't treat unassigned codepoints < 0x10000, thus violating
   Anthony's request 1.

b) It treats all non-ASCII characters as having width 2, i.e. not
   only the characters that have "ambiguous width" in Unicode 3.1/3.2,
   but even the zero-width characters! This should be enough to make
   GB18030 unusable in all terminal emulators.

   Ulrich, can you mention the rationale of this patch to the width
   table, from Yu Shao, that you accepted in January? I cannot find it
   in the public archives.

c) (a) makes an artificial distinction between characters < 0x10000
   and >= 0x10000 in Unicode.

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]