This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Default locale for Russian/Russia should be ru_RU.CP1251


On Dec 24 18:40, Andrey ``Bass'' Shcheglov wrote:
> Hi,
> 
> I'm running Cygwin 2.2.0 on an English Windows 8.1 box:
> 
> > CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin
> 
> Windows regional settings are set to Russian/Russia.
> 
> In the absence of any settings in bashrc/bash_profile, `locale` command
> outputs the following:
> 
> > LANG=ru_RU
> > LC_CTYPE="ru_RU"
> > LC_NUMERIC="ru_RU"
> > LC_TIME="ru_RU"
> > LC_COLLATE="ru_RU"
> > LC_MONETARY="ru_RU"
> > LC_MESSAGES="ru_RU"
> > LC_ALL=
> 
> This is perfectly fine, except that "no charset" in the locale output
> means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
> never been used (historically, DOS used CP866, Windows used CP1251 ANSI
> codepage, and various Unices sticked to KOI8-R before the rise of
> Unicode era).

Well, not quite.  Cygwin is following Linux here:

  linux$ locale -av
  [...]
  locale: ru_RU           archive: /usr/lib/locale/locale-archive
  ----------------------------------------------------------------------
      title | Russian locale for Russia
     source | RAP
    address | Sankt Jorgens Alle 8, DK-1615 Kobenhavn V, Danmark
      email | bug-glibc-locales@gnu.org
   language | Russian
  territory | Russia
   revision | 1.0
       date | 2000-06-29
    codeset | ISO-8859-5

  cygwin$ locale -av
  [...]
  locale: ru_RU           archive: /mnt/c/WINDOWS/system32/KERNEL32.DLL
  ----------------------------------------------------------------------
   language | Russian
  territory | Russia
    codeset | ISO-8859-5

> Cygwin docs state that
> 
> > Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.

You missed to read on:

  Cygwin uses a character set which is the typical Unix-equivalent to
  the Windows ANSI codepage.  For instance: [...]

> which is not true in my case (Windows ANSI codepage for Cyrillic is
> CP1251, not ISO-8859-5!).

Rephrasing the above, Cygwin only uses the ANSI codepage to fetch the
default Linux codepage from there.  Maybe the documentation is a bit
fuzzy, but it didn't say the charset is set *to* the Windows ANSI
charset, it just *uses* the information to compute and set the codeset
to the equivalent Linux codeset.

> Surprisingly, for Belarusian (a.k.a
> Belorussian, Eastern Slavic language very close to Russian) "be_BY"
> locale the default charset is indeed CP1251 which is in accordance with
> both the documentation and common sense.

See the docs:

  The default charset of the "be_BY" locale (Belarusian/Belarus) is CP1251.
  With the "@latin" modifier it's UTF-8.

Just as on Linux.

> Despite that, $(locale -u) returns "en_GB", despite all regional
> settings are set to Russian/Russia. I believe this is not correct,
> either, and needs to be fixed.

The locale is directly taken from the Windows system function
GetUserDefaultUILanguage() in case of the -u option(*), and from
GetUserDefaultLCID() in case of the -f option(**).  This value is then
fed into the Windows function GetLocaleInfo()(***) to fetch language and
territory codes and that's what locale -u/-f prints.

So, looks like you're using a UK-english system with just the region
settings changed to Russia.

In general UTF-8 is the preferred codeset so setting LANG to ru_RU.utf8
(locale -fU should work for you) is the better choice.


Corinna

(*) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l805

(**) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l812

(**) https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/utils/locale.cc;h=fadf3f3dacedad6474c92aabe826620b2677e494;hb=HEAD#l114

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]