This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: cygwin + GetConsoleOutputCP
On 21 March 2011 11:17, Corinna Vinschen wrote:
> On Mar 21 07:53, Andy Koppe wrote:
>> On 20 March 2011 19:13, Charles Wilson wrote:
>> > So basically if you specify -iso (or --conv iso) without any of the
>> > "input encoding specification" options like -437 etc, then dos2unix will
>> > autodetect attempt to detect the *console* encoding. ÂIf it succeeds,
>> > then it will "convert" character codes from that encoding to their
>> > equivalent in ISO-8859-1 ("Latin 1") [unconvertible codes are replaced
>> > with an ascii dot]
>> >
>> > Note that this autodetect, if it works, assumes that the console's CP is
>> > the input file's CP. ÂFair enough -- and it's an overridable default
>> > anyway. ÂHowever, I wonder if, in cygwin-1.7, we actually can/should use
>> > the "console codepage" in ANY way. ÂHere's the code:
>> >
>> > querycp.c:
>> > #elif defined (WIN32) || defined(__CYGWIN__)
>> >
>> > /* Erwin Waterlander */
>> >
>> > #include <windows.h>
>> > unsigned short query_con_codepage(void) {
>> > Â return((unsigned short)GetConsoleOutputCP());
>> > }
>> > #else
>> >
>> > Or if instead, on cygwin, we should use some other mechanism (locale
>> > settings?) to determine the correct default "input" codepage.
>>
>> I think defaulting to the console codepage makes sense for the DOS
>> side of the conversion. Having said that, Windows files that aren't
>> "Unicode", i.e. UTF-16, are usually encoded in the so-called ANSI
>> codepage, e.g. CP1252, so it would make more sense to default to that.
>
> I agree with Andy here. ÂI don't think there are really a lot of files
> left today, which are encoded using the old DOS codepages.
>
>> However, the real problem with this feature is that the Unix side of
>> the conversion is fixed to ISO-8859-1, which makes it near-useless
>> when Cygwin defaults to UTF-8. And it's no use for non-Western
>> European languages in any case.
>
> Right again. ÂAnd not only Cygwin, almost all modern UNIX systems are
> using UTF-8 now. ÂThe -iso option just doesn't make sense.
>
>> A worthwhile conversion feature would use
>> MultiByteToWideChar()/WideCharToMultiByte() defaulting to the system's
>> ANSI codepage on the DOS side, and mbstowcs()/wcstombs() defaulting to
>
> Well, I'm not sure about that. ÂThe complexity of codepage settings on a
> Windows system makes the whole afair a guesswork which will always tend
> to do the wrong thing anyway. ÂThere are the following codepages available:
>
> - The current input console codepage, GetConsoleCP().
>
> - The current output console codepage, GetConsoleOutputCP().
>
> - The current OEM codepage, GetOEMCP().
>
> - The current ANSI codepage, GetACP().
>
> - The default OEM codepage of the default system locale,
> ÂGetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage of the default system locale,
> ÂGetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> - The default OEM codepage of the current user or process,
> ÂGetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage of the current user or process,
> ÂGetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> - The default OEM codepage used for system invariant operations,
> ÂGetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage used for system invariant operations,
> ÂGetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> Which is the right one?
GetACP(), which "retrieves the current Windows ANSI code page
identifier for the operating system". That's what programs using the
non-Unicode APIs get. It's also the default in Notepad and other
editors.
Other code pages would need to be specified explicitly by the user.
> In theory the option is not useful and should just go away.ÂIf you
> have to keep it for backward compatibility, stick to the current
> behaviour and outlaw its use, perhaps be printing a nagging warning
> to stderr.
... and pointing them at iconv (which, to be fair, the -iso
description already does).
Andy
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple