This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: CR-LF handling behavior of SED changed recently - this breaks a lot of MinGW cross build scripts


On 2017-06-14 10:07, cyg Simple wrote:
> On 6/13/2017 1:34 PM, Brian Inglis wrote:
>> On 2017-06-13 08:11, cyg Simple wrote:
>>> On 6/10/2017 10:30 PM, Eric Blake wrote:
>>>> On 06/10/2017 08:48 AM, cyg Simple wrote:
>>>>> Uhm, 'wt' and 'wb' came from MS itself.
>>>> Not quite. fopen(,"wb") comes from POSIX.  "wt" is probably a microsoft
>>>> extension, but it is certainly not in POSIX nor in glibc.
>>> I think it's a C standard so it should be in glibc.  It may be mentioned
>>> in the POSIX standard as in support of the C standard.
>>>>>  GNU GCC was adapted to allow it
>>>> Huh? It's not whether the compiler allows it, but whether libc allows
>>>> it.  ALL libc that are remotely close to POSIX compliant support
>>>> fopen(,"wb"), but only Windows platforms (and NOT glibc) support
>>>> fopen(,"wt").
>>> Looking at http://www.cplusplus.com/reference/cstdio/fopen/ I see:
>>> "If additional characters follow the sequence, the behavior depends on
>>> the library implementation: some implementations may ignore additional
>>> characters so that for example an additional "t" (sometimes used to
>>> explicitly state a text file) is accepted."
>>> There is also a lot of discussion about the topic at:
>>> https://stackoverflow.com/questions/229924/difference-between-files-writen-in-binary-and-text-mode
>>> As for glibc, it will just ignore the extra character but it allows the
>>> use of "wt"; it just means nothing to that C runtime library. It does
>>> aide in portable code though.
>>> As for me conflating GCC with a C runtime - please forgive my lapse in
>>> memory.
>>
>> There's no need for open mode "t", as text is the default mode unless
>> "b" is specified, and assuming you use "cooked" line I/O functions like
>> fgets/fputs, not "raw" binary I/O like fread/fwrite; fscanf ignores all
>> line terminators unless you use formats like "%c" which could see them.
>>
> 
> That isn't exactly true based on the MSDN[1] the "t" manages the CTRL-Z
> EOF marker.  However, I agree that it worthless.  But regardless the C
> standard states that "t" or whatever extra character can be added and
> left to the implementing library to interpret or ignored.  If the C
> runtime library doesn't use it or ignore it then it isn't complying to
> the C standard.

The Standard supports only /[ra](b|+|b+|+b)?|w(b|+|b+|+b)?x?/, although
implementations may choose to ignore some of the allowed trailing
characters (presumably "b", "+", or "x", as the footnote is unclear), or
the file so created may not be accessible as a stream, and anything else
invokes UB.

"7.21.5.3 The fopen function
Synopsis
1 #include <stdio.h>
FILE *fopen(const char * restrict filename,
const char * restrict mode);
Description
...
3 The argument mode points to a string. If the string is one of the
following, the file is open in the indicated mode. Otherwise, the
behavior is undefined.[271]

r		open text file for reading
w		truncate to zero length or create text file for writing
wx		create text file for writing
a		append; open or create text file for writing at
		end-of-file
rb		open binary file for reading
wb		truncate to zero length or create binary file for
		writing
wbx		create binary file for writing
ab		append; open or create binary file for writing at
		end-of-file
r+		open text file for update (reading and writing)
w+		truncate to zero length or create text file for update
w+x		create text file for update
a+		append; open or create text file for update, writing at
		end-of-file
r+b or rb+	open binary file for update (reading and writing)
w+b or wb+	truncate to zero length or create binary file for update
w+bx or wb+x	create binary file for update
a+b or ab+	append; open or create binary file for update, writing
		at end-of-file
...
[271] If the string begins with one of the above sequences, the
implementation might choose to ignore the remaining characters, or it
might use them to select different kinds of a file (some of which might
not conform to the properties in 7.21.2."

> [1] https://msdn.microsoft.com/en-us/library/yeby3zcb(v=vs.140).aspx
> 
> "t
> Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
> an EOF character on input. In files that are opened for reading/writing
> by using "a+", fopen checks for a CTRL+Z at the end of the file and
> removes it, if it is possible. This is done because using fseek and
> ftell to move within a file that ends with CTRL+Z may cause fseek to
> behave incorrectly near the end of the file."

Wonder if "t" is also required in order to have <ctrl-Z> recognized as
console input EOF?
That page also documents a bunch of other mode characters and encoding
arguments that make that implementation far from Standard.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]