This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: iconv + normalization + upper/lower-case
- To: libc-alpha at sources dot redhat dot com
- Subject: Re: iconv + normalization + upper/lower-case
- From: Bruno Haible <haible at ilog dot fr>
- Date: Sat, 22 Sep 2001 23:00:18 +0200 (CEST)
- References: <52qpqtse28mf3pgpnrsunarknufi2mi7rq@4ax.com>
Stefan Hoffmeister wrote:
> http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html
>
> specifies a normalization
>
> 0x03BC --> 0x00B5
The other way around: it specifies a normalization 0x00B5 -> 0x03BC.
When you are at the "genuine" Greek mu 0x03BC, no kind of Unicode
mapping will ever get you back to the micro sign 0x00B5.
> cu = towupper(0x00B5); // = 0x039C
> cl = towlower(cu); // = 0x03BC
Similarly, German "ß", when uppercased and then lowercased, becomes
"ss". Forget about the assumption that towlower (towupper (x)) == x.
It doesn't hold.
> I am unable to find any text reference to character normalization in the
> sources - does glibc implement this somehow?
No, glibc doesn't implement general Unicode normalization.
Your only chance to get back from 0x03BC to 0x00B5 in glibc is by
using an iconv converter to ISO-8859-1//TRAMSLIT. Transliteration is
off by default in iconv() and wcstombs().
Bruno