This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: cygwin + GetConsoleOutputCP


On 21 March 2011 11:17, Corinna Vinschen wrote:
> On Mar 21 07:53, Andy Koppe wrote:
>> On 20 March 2011 19:13, Charles Wilson wrote:
>> > So basically if you specify -iso (or --conv iso) without any of the
>> > "input encoding specification" options like -437 etc, then dos2unix will
>> > autodetect attempt to detect the *console* encoding. ÂIf it succeeds,
>> > then it will "convert" character codes from that encoding to their
>> > equivalent in ISO-8859-1 ("Latin 1") [unconvertible codes are replaced
>> > with an ascii dot]
>> >
>> > Note that this autodetect, if it works, assumes that the console's CP is
>> > the input file's CP. ÂFair enough -- and it's an overridable default
>> > anyway. ÂHowever, I wonder if, in cygwin-1.7, we actually can/should use
>> > the "console codepage" in ANY way. ÂHere's the code:
>> >
>> > querycp.c:
>> > #elif defined (WIN32) || defined(__CYGWIN__)
>> >
>> > /* Erwin Waterlander */
>> >
>> > #include <windows.h>
>> > unsigned short query_con_codepage(void) {
>> > Â return((unsigned short)GetConsoleOutputCP());
>> > }
>> > #else
>> >
>> > Or if instead, on cygwin, we should use some other mechanism (locale
>> > settings?) to determine the correct default "input" codepage.
>>
>> I think defaulting to the console codepage makes sense for the DOS
>> side of the conversion. Having said that, Windows files that aren't
>> "Unicode", i.e. UTF-16, are usually encoded in the so-called ANSI
>> codepage, e.g. CP1252, so it would make more sense to default to that.
>
> I agree with Andy here. ÂI don't think there are really a lot of files
> left today, which are encoded using the old DOS codepages.
>
>> However, the real problem with this feature is that the Unix side of
>> the conversion is fixed to ISO-8859-1, which makes it near-useless
>> when Cygwin defaults to UTF-8. And it's no use for non-Western
>> European languages in any case.
>
> Right again. ÂAnd not only Cygwin, almost all modern UNIX systems are
> using UTF-8 now. ÂThe -iso option just doesn't make sense.
>
>> A worthwhile conversion feature would use
>> MultiByteToWideChar()/WideCharToMultiByte() defaulting to the system's
>> ANSI codepage on the DOS side, and mbstowcs()/wcstombs() defaulting to
>
> Well, I'm not sure about that. ÂThe complexity of codepage settings on a
> Windows system makes the whole afair a guesswork which will always tend
> to do the wrong thing anyway. ÂThere are the following codepages available:
>
> - The current input console codepage, GetConsoleCP().
>
> - The current output console codepage, GetConsoleOutputCP().
>
> - The current OEM codepage, GetOEMCP().
>
> - The current ANSI codepage, GetACP().
>
> - The default OEM codepage of the default system locale,
> ÂGetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage of the default system locale,
> ÂGetLocaleInfo (LOCALE_SYSTEM_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> - The default OEM codepage of the current user or process,
> ÂGetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage of the current user or process,
> ÂGetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> - The default OEM codepage used for system invariant operations,
> ÂGetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTCODEPAGE, ...).
>
> - The default ANSI codepage used for system invariant operations,
> ÂGetLocaleInfo (LOCALE_INVARIANT, LOCALE_IDEFAULTANSICODEPAGE, ...).
>
> Which is the right one?

GetACP(), which "retrieves the current Windows ANSI code page
identifier for the operating system". That's what programs using the
non-Unicode APIs get. It's also the default in Notepad and other
editors.

Other code pages would need to be specified explicitly by the user.


> In theory the option is not useful and should just go away.ÂIf you
> have to keep it for backward compatibility, stick to the current
> behaviour and outlaw its use, perhaps be printing a nagging warning
> to stderr.

... and pointing them at iconv (which, to be fair, the -iso
description already does).

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]