This is the mail archive of the libc-hacker@sourceware.cygnus.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: merge done


> Cc: schwab@suse.de, libc-hacker@sourceware.cygnus.com
> Reply-To: drepper@cygnus.com (Ulrich Drepper)
> From: Ulrich Drepper <drepper@cygnus.com>
> Date: 31 Aug 1999 08:39:57 -0700
> 
> Geoff Keating <geoffk@ozemail.com.au> writes:
> 
> > On my system I think I have 35 copies of the en_DK LC_CTYPE, for
> > instance, which come up to nearly 400k; and 22 copies of its
> > LC_COLLATE, which is nearly 700k.
> 
> Well, if you look at the files you find that many are different. 

Yes, but many are the same.  For instance:

[geoffk@geoffk locale]$ md5sum */LC_CTYPE
2ddcbc6c352b4734e7d4ab020ec1c07f  cs_CZ/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  da_DK/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_AT/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_BE/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_CH/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_DE/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  de_LU/LC_CTYPE
86ab85d84ef4e7cf25c0f2f4a8bb227c  el_GR.ISO8859-7/LC_CTYPE
86ab85d84ef4e7cf25c0f2f4a8bb227c  el_GR/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_AU/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_CA/LC_CTYPE
5eb82a102e61d7c32335591e9be24661  en_DK/LC_CTYPE
...

[geoffk@geoffk locale]$ md5sum */LC_COLLATE
0b12c2bf93730c1911a6adb66da13cf6  cs_CZ/LC_COLLATE
af476468a5848376d2baa4fe24e0dc31  da_DK/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_AT/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_BE/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_CH/LC_COLLATE
fd032782430ad9fcf67b6aa91c251a2f  de_DE/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  de_LU/LC_COLLATE
53fce17204ea0403e236acfb47c537e9  el_GR.ISO8859-7/LC_COLLATE
89a594a0a067b957f8c6a8469856d2ca  el_GR/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  en_AU/LC_COLLATE
088130adb3c3dae4aa9e7f778fe7e019  en_CA/LC_COLLATE
768a94567dedb6476d179715e5ae5d85  en_DK/LC_COLLATE
...

> And with the new possibilities most of the LC_COLLATE definitions
> will differ in the one or the other form.  This hasn't happened so
> far because of it so difficult to describe without a terrible amount
> of duplication.
> 
> For the LC_CTYPE stuff.  I could see a way how to avoid it but it is
> not very clean.  The biggest part of the data is the table for the
> isw*() and tow*() functions.  But since the encoding is almost always
> UCS4 (there will be a few differences in future) it means the tables
> are ideally the same, all filled with the information from the Unicode
> tables.
...

Yes, that would help.

The point I'm trying to make is that many of the collating orders
etc. are common between different countries and languages.  For
instance, I'd be very suprised to find that en_AU, en_US, en_GB, en_IE
and en_CA have different definitions for LC_CTYPE, and the first four
should be the same for LC_COLLATE (fr_CA and en_CA are the same and
different from en_AU, which sort of makes sense).

I don't mean that _every_ LC_COLLATE will be the same; that would be
silly.  Just that there are a relatively small number of alternatives
compared to the total number of locales.

-- 
Geoffrey Keating <geoffk@cygnus.com>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]