This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" character set (again)


Corinna Vinschen wrote:
On Jan 8 12:12, Thomas Wolff wrote:
Andy Koppe wrote:
There's an important distinction here between the C locale and the
defaut locale. The C locale is what you get if you don't call
setlocale at all, whereas the default locale is what you get if you
call setlocale(LC_FOO, "") and the relevant environment variables are
all unset or empty.

The default locale uses UTF-8, and I most certainly agree that this
should stay as is. The charset of the filesystem and the console are
both controlled by the default locale (unless overridden in the
environment). They are independent of the C locale's charset or
whether an application calls setlocale.

No, this is about the C locale only. Lots of people and programs make
assumptions about the C locale which may not be valid according to
POSIX, but which nevertheless hold true for Linux and most (if not
all) other Unices, including Cygwin 1.5. The most important assumption
is that the C locale is 8-bit clean.
And byte-transparent, right?
Which gets me back to this printf issue; actually your point here
seems to approve my arguments there, if only I had explicitly
restricted them to the C locale.
Could you agree that functions like sprintf should handle their char
* arguments byte-transparently if acting in the C locale?
It does! ...
I couldn't reproduce this for an hour until I noticed why, and suddenly all arguments seem to blend well together:
My sample program (attached) as well as the sample from the other thread do not even work if cygwin runs in an 8-bit locale.
This is surprising - a user cannot rectify the problem using the locale mechanism although it is supposed to provide the feature of proper adjustment.
The program can and can only be convinced to do what's expected if the setlocale is invoked to explicitly set an 8 bit locale
(included in a comment of my program).
The reason is probably the programs always start in the "C" locale (I think that's something claimed by POSIX?). If that's UTF-8, however, behaviour of locale-agnostic programs is not as expected. This actively breaks legacy compatibility.
So, actually, reconsidering your response above, no it does not. If running in the C locale, whether explicitly or implicitly,
sprintf is not byte-transparent in 1 of 3 cases (of my sample program), and printf is not byte-transparent in 2 of 3 cases (which is another surprising inconsistency, between printf and sprintf).


Some of the details have been noted before (sorry), but for me, this summary results in a clearer picture now,
and the best and easiest solution IMHO would be to indeed change the C locale back to 8 bit, byte-transparent, and not even plan to rechange that later.
(That's why I'm discussing it here, not in the sprintf thread.)


The problem occurs in the *format* string. ...
[Maybe this should be discussed in the other thread but let's keep it together for now.]
Yes, and I doubted (in the other thread) that is should occur, putting it more precisely now, because in
http://www.opengroup.org/onlinepubs/9699919799/functions/sprintf.html
the condition that "a wide-character code that does not correspond to a valid character has been detected" is only mentioned as a condition for the EILSEQ error.
While Andy had a valid point in finding *format* to be described as a "character string" and relating that to a generic POSIX definition of character,
this certainly does not justify the current behaviour of slient dropping and reporting partial success because that is not one of the options in the "RETURN VALUE" section;
also I don't see what Andy's claim "Including invalid bytes in the format string is undefined behaviour." is based on.


So I'd like to encourage you to apply your patch to vprintf (I don't see a need to feel uneasy about it) in any case - whether or not the C locale gets changed;
there is an additional consideration in favour of it:
The printf functions, especially fprintf and sprintf, are not necessarily preparing text output, esp. to a terminal. They can also be used to prepare binary data for output into a file which is totally locale-agnostic and shouldn't be broken.


------
Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]