This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: iconv + normalization + upper/lower-case


Stefan Hoffmeister wrote:

> http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html
>
> specifies a normalization
>
>   0x03BC --> 0x00B5

The other way around: it specifies a normalization 0x00B5 -> 0x03BC.
When you are at the "genuine" Greek mu 0x03BC, no kind of Unicode
mapping will ever get you back to the micro sign 0x00B5.

> cu = towupper(0x00B5);  // = 0x039C
> cl = towlower(cu);      // = 0x03BC

Similarly, German "ß", when uppercased and then lowercased, becomes
"ss". Forget about the assumption that  towlower (towupper (x)) == x.
It doesn't hold.

> I am unable to find any text reference to character normalization in the
> sources - does glibc implement this somehow?

No, glibc doesn't implement general Unicode normalization.

Your only chance to get back from 0x03BC to 0x00B5 in glibc is by
using an iconv converter to ISO-8859-1//TRAMSLIT. Transliteration is
off by default in iconv() and wcstombs().

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]