This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: [1.7] wcwidth failing configure tests]


Corinna Vinschen wrote:
On May 12 19:31, Corinna Vinschen wrote:
On May 12 17:56, Andy Koppe wrote:
And here's another question. The utf8*.h files claim they have been
generated from the unicode.txt file of the Unicode 3.2 standard. Do we
have the script which generated the utf8*.h files? Can we regenerate
the files to match the current Unicode 5.1 standard?
There's Markus Kuhn's wcwidth implementation, which says it's based on
Unicode 5.0:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
This looks nice.

Trouble is, there's the thorny issue of the "CJK Ambiguous Width"
category of characters, which consists of things like Greek and
Cyrillic letters as well as line drawing symbols. Those have a width
of 1 in Western use, yet with CJK fonts they have a width of 2. That's
why Markus Kuhn's code includes the mk_wcswidth_cjk() variant.
We should use the standard variation alone, imho.

And we need some workaround for UTF-16 systems like Cygwin.
Unfortunately, surrogate pairs only work well as part of a string, not
as standalone chars. So wcwidth would return -1 for each single char,
but wcswidth could be tweaked to handle them gracefully.

Jeff, is that wcwidth something for newlib? I'd be willing to tweak it for newlib and to add the surrogate pair handling to wcswidth.


Sure.

-- Jeff J.
Corinna


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]