This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Collation of underline U0005F correct?
- From: Andreas Jaeger <aj at suse dot de>
- To: libc-alpha at sources dot redhat dot com
- Cc: Lars Marowsky-Bree <lmb at suse dot de>
- Date: Wed, 12 Dec 2001 13:23:22 +0100
- Subject: Collation of underline U0005F correct?
I'm currently looking into some string sorting example with underlines
like:
$ LC_ALL=POSIX sort data
_ablah
_blah
blah
eblah
gee:~/tmp:[0]$ LC_ALL=de_DE sort data
_ablah
blah
_blah
eblah
I'm trying to understand whether this is correct and how it works.
glibc has at localedata/locales/iso14651_t1:
<U005F> IGNORE;IGNORE;IGNORE;<U005F> # 33 _
Therefore we ignore the underline for sorting and get the output.
Right, so far?
But looking at Unicode, I noticed the Unicode collation algorithm
(http://www.unicode.org/unicode/reports/tr10/) and found at
http://www.unicode.org/unicode/reports/tr10/allkeys.txt this line:
005F ; [*0209.0021.0002.005F] # LOW LINE; COMPATSEQ
In Unicode underline seems not to be ignored, it's specially treated.
So, where does this leave us in glibc?
Is glibc implementing the Unicode collation algorithm? Then why is
there a difference?
Can anybody share some insights?
thanks,
Andreas
--
Andreas Jaeger
SuSE Labs aj@suse.de
private aj@arthur.inka.de
http://www.suse.de/~aj