This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
The attached small patch affects character widths as reported by wcwidth(). It addresses an obscure issue. The CJK ambiguous width category contains characters that are one character cell wide in some contexts and two cells in others. That category doesn't actually contain CJK characters as such, but things like the Greek and Cyrillic alphabets, accented Latin characters, and also line drawing characters. These are usually one cell wide, but in CJK legacy encodings such as SJIS or GBK, they were encoded as two bytes, and the usual practice was to have the display width correspond to the number of bytes. Accordingly, CJK terminal fonts usually have double-width glyphs for the affected characters. See also http://unicode.org/reports/tr11/#Ambiguous. Newlib currently decides which width to use based on the selected LC_CTYPE locale, i.e. it will use double width for "zh", "jp", and "ko" locales, and single width for everything else, independent of the selected character set. The attached patch changes this so that single width will always be used for single-byte encodings such as the ISO-8859 ones, and that double width will always be used for the CJK legacy encodings. For UTF-8, the decision will still be made based on the locale. The @cjknarrow modifier can still be used to force single width, independent of locale and encoding. The point of this is to fit in with the historical use of those legacy encodings, since the ambiguity only arose once the different charsets were combined into Unicode. I doubt anyone is using nonsensical locale/encoding combinations such as de_DE.GBK or ja_JP.ISO-8859-1, so this is primarily about the likes of C.GBK and C.SJIS. Those are currently ambiguous-narrow, but vim for example treats them as ambiguous-wide, which makes for "interesting" effects when editing files containing affected characters. The patch here fixes that. Tested in Cygwin. I assume this will need to wait for Corinna's return. * libc/locale/locale.c: Fix ambigous width to one for singlebyte charsets and two for non-Unicode multibyte charsets. Regards, Andy
Attachment:
ambiwidth.patch
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |