This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting


2011/5/24 Reini Urban:
> 2011/5/18 Sven Severus:
>> let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll
>> 1.7.9-1, full installation 2 weeks ago).
>>
>> File foo.h is an ordinary text file, all lines are terminated with DOS
>> style line endings <cr> <lf> (hex: 0d 0a).
>> It is located in a directory with textmode mounting in cygwin.
>> One <cr> <lf> sequence of foo.h is split by a 4096 byte boundary within
>> the file: "od -c -Ax foo.h" shows a <cr> (='\r') at byte offset 4095
>> (0xfff)
>> and a <lf> (='\n') at offset 4096 (0x1000):
>> ...
>> 000ff0 ? / ? / ? / ? / ? / ? / ?\r ?\n ? / ? / ? X ? X ? X ? X ? X ?\r
>> 001000 ?\n ? / ? / ?\r ?\n ? / ? / ?\r ?\n
>> 001009
>>
>> Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h"
>> to produce foomod.h, located in the same directory as foo.h, thus with
>> textmode mounting too.
>> When I examined the result, I noticed that foomod.h was one byte bigger
>> then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports:
>> ...
>> 000ff0 ? / ? / ? / ? / ? / ? / ?\r ?\n ? / ? / ? X ? X ? X ? X ? X ?\r
>> 001000 ?\r ?\n ? / ? / ?\r ?\n ? / ? / ?\r ?\n
>> 00100a
>>
>> Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
>> became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!
>>
>> In other files created by Perl with output redirection I observed this
>> behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
>> boundary (even multiple times in one output file). Line endings not
>> split by a 4096 byte boundary do not show this behaviour.
>>
>> The behaviour does not occur, when the destination file is located
>> in a directory with binmode mounting. It does not occur either, when
>> I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"),
>> so I think the problem is specific to Cygwin Perl, not to Cygwin in
>> general.
>>
>> I this a bug of the output buffering mechanism of Cygwin Perl?
>> Or do I anything wrong?
>> Any answer is highly appreciated. Thanks in advance.
>
> Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
> The last char of the buffer is not stored when checking the first char
> of the new buffer.
> I think first we have to provide a sample test case to perl core.

I could not reproduce it in perl core with the PerlIO :crlf layer, see
attached test.
I'm investigating cygwin buffer edge-case handling now.

-- 
Reini Urban

Attachment: crlf-bufedge.patch
Description: Binary data

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]