This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Collation of underline U0005F correct?



I'm currently looking into some string sorting example with underlines
like:

$ LC_ALL=POSIX sort data
_ablah
_blah
blah
eblah
gee:~/tmp:[0]$ LC_ALL=de_DE sort data
_ablah
blah
_blah
eblah

I'm trying to understand whether this is correct and how it works.

glibc has at localedata/locales/iso14651_t1:
<U005F> IGNORE;IGNORE;IGNORE;<U005F> # 33 _

Therefore we ignore the underline for sorting and get the output.
Right, so far?

But looking at Unicode, I noticed the Unicode collation algorithm
(http://www.unicode.org/unicode/reports/tr10/) and found at
http://www.unicode.org/unicode/reports/tr10/allkeys.txt this line:

005F ; [*0209.0021.0002.005F] # LOW LINE; COMPATSEQ

In Unicode underline seems not to be ignored, it's specially treated.

So, where does this leave us in glibc?  

Is glibc implementing the Unicode collation algorithm?  Then why is
there a difference?

Can anybody share some insights?

thanks,
Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]