This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


On Sep 25 19:42, Andy Koppe wrote:
> 2009/9/25 Corinna Vinschen:
> >> - System objects will always be translated using UTF-8. This includes
> >> file names, user names, and initial environment variables (and
> >> probably more I'm not aware of).
> >[...]
> The downside, of course, is that non-ASCII filenames created in a
> non-UTF8 locale won't show up correctly in Windows, and vice versa.
> But that's the same on Linux if the global setting is UTF-8 while the
> terminal is set to something else. And the stock answer to any
> complaints will be: Use UTF-8!
> 
> In any case, the DCxx scheme will ensure that things work correctly
> within any particular locale.
> 
> And I guess the ^N scheme can go (or be disabled)?

Probably not.  I spent some more time thinking about the various
scenarios (partly instead of sleeping) and it occured to me that using
UTF-8 exclusively is a nice dream.

Still, what about your tar example given in
http://cygwin.com/ml/cygwin-developers/2009-09/msg00043.html?

If we stick to UTF-8 exclusively we *have* to create the convmv-like
tool which allows to convert "broken" filenames to be converted from the
\016\377\x notation to the UTF-8 \c2\x or \c3\x notation, otherwise.

Or would it be better to allow to switch the charset using the locale
environment variables, regardless, as you proposed:
  
    $ LANG=C.KOI8-R tar xzf bla.tgz

What's the right thing to do?  I'm still unsure.  With your proposal,
it's at least the user choose and if some interoperability issue occurs
and the user complains, we can point to the FAQ: "Use UTF-8, dumbass!"

> >?So, utilizing the initial setting of LC_ALL/ff. is as good
> > as defaulting to UTF-8 and allowing to switch via a setcons tool.
> 
> 'setcons' requires a wrapper script, whereas the variables don't
> necessarily, as they can be set in the Windows environment. This would
> allow programs to be invoked directly from a shortcut and still
> picking up the user's setting.
> 
> Also, one of the locale variables needs to be set anyway if one wants
> to use something other than the default locale.
> 
> > I have
> > found an easy way to allow a setcons tool which only switches the charset
> > used by Cygwin. ?It doesn't affect the setting in cmd, or made by chcp.
> 
> That's a good idea. I've come round to thinking that 'setcons' is
> worth having in addition to the initial setting from the environment.

Ok, let's use the environment variables for now.  Creating a setcons
tool will be possible, but is low priority then.

> >> - setlocale() will have no effects beyond what's expected in Linux.
> >
> > Well... probably. ?I'm not saying yes without asking a lawyer first.
> 
> :)  I put that a bit too probingly, didn't I?

Yep :)

So, the modified list alongside your proposal looks like this:

- System objects will always be *initially* translated using UTF-8. This
  includes file names, user names, and initial environment variables.
- By setting the locale environ variables you can switch the charset
  used to translate filenames on a per-process base.
  This would be only a stop-gap measure, to allow to re-use old archives
  or scripts.  Those should be converted to UTF-8 ASAP.  Expect complaints.
- The "C" locale's charset will be UTF-8.
- There'll be language-neutral "C.<charset>" locales.
- The user's ANSI codepage will remain the default charset for
"language_TERRITORY" locales.
- The console charset will be set according to LC_ALL/LC_CTYPE/LANG
  at the time the application starts.
- setlocale() will (probably) have no effects beyond what's expected in Linux.

So which approach do we take, the one from
http://cygwin.com/ml/cygwin-developers/2009-09/msg00050.html
or the one above?  The implementation differs only marginally
in complexity, since the most of it is already there.

Please vote.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]