This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


On Sep 24 19:46, Andy Koppe wrote:
> 2009/9/24 Corinna Vinschen:
> >> > Last but not least, the alternative would be to store the Console
> >> > character set as an environment variable, just as the
> >> > "CYGWIN=codepage:[ansi:oem]" setting back in 1.5 times. ?Sigh.
> >>
> >> Hmm. How about simply using the standard locale variables again?
> >> Except unlike now the console charset would be set at startup rather
> >> than by setlocale.
> >
> > Yeah., it's just... what bugs me with this approach is
> >
> > - If you want to switch to a certain charset, you must do this before
> > ?the first Cygwin process in the console starts (via batch file or
> > ?system-wide setting of LC_ALL/LC_CTYPE/LANG.
> 
> I'd actually assumed that LC_ALL/LC_CTYPE/LANG would be read at
> program startup rather than Cygwin DLL startup. Having pondered this a
> bit more now, I'm quite convinced now that this is the right approach
> to take, for the following reasons:

No, they are only read at DLL startup, right now only to convert
the Windows environment to the Cygwin POSIX environment using the
initial locale setting.

> - It addresses the issue with the console encoding depending on
> whether or not a program happens to call setlocale(LC_CTYPE, "").
> - Yet it still allows the charset to be changed, by invoking a program
> with a changed variable setting.
> - It fulfils programs' assumption that the terminal charset is the
> same as what's set in their initial environment.
> - Users (and documentation) only have to worry about one setting.
> - No non-standard tools are needed.
> 
> Additionally, it occurred to me that the issue you described regarding
> ssh output also applies to filenames. For example, for a file archiver
> like tar, the charset doesn't matter, because it can assume that
> filenames are simply sequences of bytes. Therefore, it may or may not
> call setlocale(LC_CTYPE, "").
> 
> With Cygwin's filename charset currently being set by setlocale,
> however, this does make a big difference: if tar does call setlocale,
> the filenames will be translated according to the user's preferences,
> but if not, it'll use the C locale.
> 
> Therefore, I think the same approach as for the console should be
> applied to filenames: the charset is set according the environment
> variable settings at program startup, and setlocale calls do not
> change it.
> 
> Advantages over the current approach:
> - setlocale would have no effects beyond what's expected on Linux.
> - Filenames do not change across setlocale calls.
> - It adheres to Linux programs' assumption that filenames are encoded
> in the charset set in the initial environment.
> - It reduces the importance of the C locale.

That's an important objection.  It turns everything I'm just working
on upside down (again!)

Wouldn't this consequentially mean we should stick to UTF-8 for
filenames entirely in the long run?  Because, *if* tar uses setlocale(),
the files would have potentially surprisingly ugly names when unpacked
on a Unix machine.
Note that this affects all strings used in Cygwin internally, not only
filenames.  User and group names, environment strings, ...
If an application switches to another locale, all the names internally
stored are not switched as well.  So they are potentially wrong after
a setlocale.  And that doesn't change with what you propose.

> > - If you want to switch the console to another charset you can't do that
> > ?on the fly in Cygwin.
> 
> You can't in xterm or rxvt either, at least not without the likes of luit.

My xterms have a UTF-8 entry in the Ctrl-<right mouse key> menu which
can be switched on and off...  Unless xterm has been already started in
UTF-8 mode, in which case the entry is disabled.
> 
> You can in mintty, and also in gnome-terminal and KDE Konsole, but to
> be honest it's a rather questionable feature, because applications
> don't get to know about such an on-the-fly character set change, hence
> things won't work correctly.

Yeah, but it's the users choice.  I could want a ISO-8859-1 terminal,
regardless what the application prints.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]