This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Default locale for Russian/Russia should be ru_RU.CP1251


On 24/12/2015 16:40, Andrey ``Bass'' Shcheglov wrote:
Hi,

I'm running Cygwin 2.2.0 on an English Windows 8.1 box:

CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin

Windows regional settings are set to Russian/Russia.

In the absence of any settings in bashrc/bash_profile, `locale` command
outputs the following:

LANG=ru_RU
LC_CTYPE="ru_RU"
LC_NUMERIC="ru_RU"
LC_TIME="ru_RU"
LC_COLLATE="ru_RU"
LC_MONETARY="ru_RU"
LC_MESSAGES="ru_RU"
LC_ALL=

This is perfectly fine, except that "no charset" in the locale output
means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
never been used (historically, DOS used CP866, Windows used CP1251 ANSI
codepage, and various Unices sticked to KOI8-R before the rise of
Unicode era).

The above is consistent with locale charmap output, which is again
ISO-8859-5.


Short C example also confirms ISO-8859-5 is used:

#include <stdio.h>

#include <locale.h>
#include <langinfo.h>

int main() {
     const char *locale = setlocale(LC_ALL, "");
     const char *codeset = nl_langinfo(CODESET);
     printf("locale: %s\n", locale);
     printf("codeset: %s\n", codeset);

     return 0;
}

outputs

locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
codeset: ISO-8859-5


Cygwin docs state that

Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.

which is not true in my case (Windows ANSI codepage for Cyrillic is
CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a
Belorussian, Eastern Slavic language very close to Russian) "be_BY"
locale the default charset is indeed CP1251 which is in accordance with
both the documentation and common sense.


Additionally, in `strace locale -u` output, I see multiple
__get_lcid_from_locale: LCID=0x0419
lines.

"0x0419" corresponds to Russian/Russia (see
<https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396>).

Despite that, $(locale -u) returns "en_GB", despite all regional
settings are set to Russian/Russia. I believe this is not correct,
either, and needs to be fixed.

the current code on
  winsup/cygwin/nlsfuncs.cc

is responsible for the ISO-8859-5 defaults.
--------------------------------------------------------------
    case 1251:
      if (lcid == 0x0c1a                /* sr_CS (Serbian Language/Former
                                                  Serbia and Montenegro) */
          || lcid == 0x1c1a             /* sr_BA (Serbian Language/Bosnia
                                                  and Herzegovina) */
|| lcid == 0x281a /* sr_RS (Serbian Language/Serbia) */ || lcid == 0x301a /* sr_ME (Serbian Language/Montenegro)*/
          || lcid == 0x0440             /* ky_KG (Kyrgyz/Kyrgyzstan) */
          || lcid == 0x0843             /* uz_UZ (Uzbek/Uzbekistan) */
                                        /* tt_RU (Tatar/Russia),
                                                 IQTElif alphabet */
          || (lcid == 0x0444 && has_modifier ("@iqtelif"))
          || lcid == 0x0450)            /* mn_MN (Mongolian/Mongolia) */
        cs = "UTF-8";
      else if (lcid == 0x0423)          /* be_BY (Belarusian/Belarus) */
        cs = has_modifier ("@latin") ? "UTF-8" : "CP1251";
      else if (lcid == 0x0402)          /* bg_BG (Bulgarian/Bulgaria) */
        cs = "CP1251";
      else if (lcid == 0x0422)          /* uk_UA (Ukrainian/Ukraine) */
        cs = "KOI8-U";
      else
        cs = "ISO-8859-5";
--------------------------------------------------------------

Regards,
Andrey.

as temporary workaround can you use UTF-8 ?

export LANG=ru_RU.UTF-8

Regards
Marco





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]