This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Read locale settings from environment


On Feb 20 13:09, Corinna Vinschen wrote:
> Ok, here's my new setlocale implementation.  It fixes the following
> problems:
> [...]
> - Per POSIX allow the required "POSIX" locale.  Map it to the "C" locale
>   as on Linux.
> 
> - If locale is "", honor the environment in the order required by POSIX
>   for all supported categories.

Apart from that, would it be ok to change setlocale() and subsequent
functions using __lc_ctype (e.g. mbtowc_r, wctomb_r, iswXXX) so that all
POSIX compliant LC_XXX environment variable settings are taken?  The
currently accepted locales

  C[-codeset]

are non-POSIX.  The POSIX variant is

  [language[_territory][.codeset][@modifier]]

Of course we should keep recognizing the C[-codeset] for backward
compatibility but I think we should not stick to them.

Actually all the related functions only rely on the charset part of the
setting, not the actual language.  So, what we could do is to split away
the charset part along the lines of what is already done in the
LC_MESSAGES part of the code and only check for that in the subsequent
functions.  Instead of checking against __lc_ctype these functions could
check for, say, __lc_charset.  The LC_CTYPE setting could then reflect
the real setting of the environment.  For instance:

LC_ALL=POSIX

  ==>  __lc_ctype == C
       __lc_charset = ISO-8859  (!)

LC_ALL=en_US.UTF-8

  ==>  __lc_ctype == en_US.UTF-8
       __lc_charset = UTF-8

LC_ALL=jp_JP.EUCJP

  ==>  __lc_ctype == jp_JP.EUCJP
       __lc_charset = EUCJP

LC_ALL=de

  ==>  __lc_ctype == de
       __lc_charset = ISO-8859  (!)

LC_ALL=fr_FR.ISO-8859-15

  ==>  __lc_ctype == fr_FR.ISO-8859-15
       __lc_charset = ISO-8859  (!)
  
Actually the __lc_charset could be a single character like I for ISO,
U for UTF, E for EUCJP, etc, to simplify the checks in mbtowc_r and the
others.


What do you say?


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]