This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH/RFA] Add CJK ambiguous character handling dependent on language
- From: IWAMURO Motonori <deenheart at gmail dot com>
- To: newlib at sourceware dot org
- Date: Thu, 4 Jun 2009 02:56:29 +0900
- Subject: Re: [PATCH/RFA] Add CJK ambiguous character handling dependent on language
- References: <20090603105715.GA20210@calimero.vinschen.de>
2009/6/3 Corinna Vinschen <vinschen@redhat.com>:
> Hi,
>
> as discussed three weeks ago, I'll now propose the following patch. ?It
> changes __wcwidth along the lines of Markus Kuhn's code and Iwamuro
> Motonori's proposal to use the language set via setlocale(1) to return
> different character widths for the CJK Ambiguous Width" category of
> characters. ?Tested on Cygwin.
>
> Ok to apply?
It looks good.
But I don't think that the test order is good for the performance.
How about the following code?
--- libc/string/wcwidth.c.ORIG 2009-06-04 02:00:48.015625000 +0900
+++ libc/string/wcwidth.c 2009-06-04 02:32:36.234375000 +0900
@@ -278,21 +278,28 @@
{ 0xE0100, 0xE01EF }
};
- /* binary search in table of ambiguous characters */
- if (__locale_cjk_lang ()
- && bisearch(ucs, ambiguous,
- sizeof(ambiguous) / sizeof(struct interval) - 1))
- return 2;
-
- /* test for 8-bit control characters */
+ /* Test for NUL character */
if (ucs == 0)
return 0;
- if (ucs < 32 || (ucs >= 0x7f && ucs < 0xa0))
+
+ /* Test for printable ASCII characters */
+ if (ucs >= 0x20 && ucs < 0x7f)
+ return 1;
+
+ /* Test for control characters */
+ if (ucs < 0xa0)
return -1;
+
/* Test for surrogate pair values. */
if (ucs >= 0xd800 && ucs <= 0xdfff)
return -1;
+ /* binary search in table of ambiguous characters */
+ if (__locale_cjk_lang ()
+ && bisearch(ucs, ambiguous,
+ sizeof(ambiguous) / sizeof(struct interval) - 1))
+ return 2;
+
/* binary search in table of non-spacing characters */
if (bisearch(ucs, combining,
sizeof(combining) / sizeof(struct interval) - 1))
--
IWAMURO Motnori <http://vmi.jp/>