This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: igncr vs text mode mounts, performance vs compatibility


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Lewis Hyatt on 10/25/2006 1:48 PM:
> Below is a patch to shell.c which implements my suggestion (It is just the
> output of diff -u, is that OK? The patch is relative to the 3.2-4 version.) I
> just modified the open_shell_script() function; now after it reads in 80
> characters to make sure the script isn't binary, it also scans that buffer for
> the first \n it finds and if it is preceded by \r, then it enables igncr. If
> necessary it reads more characters, 80 bytes at a time. In most cases the
> performance decrease for this check will be entirely negligible, particularly
> since the first line of the script is almost always going to just be the
> #!/bin/bash line anyway.

Wow - someone besides me actually wrote a patch!  What a welcome relief in
this thread.

> 
> The only downside I can see to this is that bash will behave differently
> depending whether it is reading from a file or from a pipe. But that is already
> true in other cases as well, for instance the check for binary files is not
> performed when reading from a pipe. It would be possible to extend the \r\n
> check to piped input as well, although it would be somewhat more annoying to
> implement.

When reading from a pipe, the presence or absence of \r is determined by
CYGWIN=binmode (default) or CYGWIN=nobinmode (which is less tested, and
has been documented as breaking things like 'tar xjf foo.tar.bz2').  In
bash-3.2-4, if igncr is set, then even using CYGWIN=binmode and piping in
\r will ignore the \r, whereas you are correct that your patch would not
autodetect the \r line endings.  For what I can think of, to check whether
the first line of piped input ends in \r\n, you would have to add a flag
that you are reading from a pipe and have not yet seen a line ending, then
check that flag for every read in input.c; which adds overhead to every
byte read, rather than just the first line.  You can't afford to rewind,
or read in advance, because you don't know what is feeding the other end
of the pipe.

> 
> If you think this overall concept is a good idea, I am happy to take everyone's
> suggestions and implement them. It's a bit of a struggle for me to write C code
> instead of C++, but I can give it a shot.

There are a few other things to think about with your initial attempt.
Right now, igncr is an all or nothing setting, can be inherited into child
processes via SHELLOPTS, but can only be changed by explicit user action.
 Your patch only ever turns it on without user intervention, not off.  You
are changing the global igncr variable even though you described your goal
as a per-file setting; so if bash reads two files in a row, first with
\r\n, and the second with \n only, the second inherits the fact that the
first decided to turn on igncr.  At which point, why not just make igncr
the default to begin with?  So it sounds like the effort to make a
per-file decision work correctly is getting bigger and bigger, with less
and less perceivable benefits.

> 
> -Lewis
> 
> ---------
> 
> --- shell.c	2006-10-25 15:40:53.625000000 -0400
> +++ shell_patched.c	2006-10-25 15:41:02.093750000 -0400
> @@ -1335,6 +1335,7 @@
>  open_shell_script (script_name)
>       char *script_name;
>  {
> +  extern int igncr;

My uses of igncr in 3.2-4 are protected by #if __CYGWIN__, so that my
patches will still compile on other platforms; you would need to do
likewise to avoid a link error.  Actually, when I propose an upstream
patch, I will probably do it in the form of ./configure --enable-igncr,
then protect all uses of igncr by IGNCR, so that the presence of igncr is
no longer cygwin-dependent, but can still be compiled out when desiring
strict POSIX conformance.

>    int fd, e, fd_is_tty;
>    char *filename, *path_filename, *t;
>    char sample[80];
> @@ -1426,8 +1427,32 @@
>  	  internal_error ("%s: cannot execute binary file", filename);
>  	  exit (EX_BINARY_FILE);
>  	}
> +	
> +	/* Now scan the first line of the file, if it ends with \r\n, then
> +	   enable the igncr shell option. */
> +	   	
> +	while(sample_len>1)
> +	{
> +		int i;				
> +		for(i=0; i!=sample_len; ++i) if(sample[i]=='\n')

It seems a bit redundant to loop over the sample once in
check_binary_file, then again in your patch; I wonder if it is worth
patching check_binary_file instead?

Your code formatting is not consistent with nearby code; while this is not
a serious drawback, it makes it harder for me to propose your code to the
upstream maintainer, should I choose to accept something like it.

> +		{
> +			if(i>0 && sample[i-1]=='\r') igncr=1;
> +			break;
> +		}		
> +		if(i<sample_len) break;
> +				
> +		/* else need to read more data */
> +		sample_len = read(fd, sample, sizeof(sample));
> +		if(sample_len<0)
> +		{
> +			file_error(filename);
> +			exit(EX_NOEXEC);

I'm not sure I like this - before your patch, bash can execute the first
part of a file, even if a read error would occur before encountering a
newline but after 80 characters; with your patch, you keep on plowing
through the file, looking for a newline, so you are possibly rejecting a
file that used to be partially executable.  On the other hand, read errors
are generally rare, so this corner case is probably not worth me fretting
about.

> +		}					
> +	}
> +	
>        /* Now rewind the file back to the beginning. */
>        lseek (fd, 0L, 0);

Just because the bash maintainer writes (or more likely, inherited) bad
code doesn't give you the excuse to use a magic number 0 instead of the
symbolic name SEEK_SET as the third argument to lseek.

One last point - your patch has no way to bypass automatic filtering.  I
would feel more comfortable if there were a way for users to avoid
automatic filtering (yes, maybe only masochists would use that option, but
at least it would no longer be silently forced on users).

I wonder if a better approach would be to propose a patch to the cygwin
base-files maintainer to set igncr in the default /etc/profile.  I tend to
place a high value on standardized behavior, because following standards
increases portability.  I am more than willing to make scripts on text
mounts work smoothly regardless of line ending, because text mounts are
already outside the realm of POSIX (POSIX requires fopen("r") and
fopen("rb") to be identical, which is not the case on text mounts), so
there are no standards preventing me from ignoring \r.  But on binary
mounts, the cygwin philosophy is to emulate Linux, which in turn is slowly
converging on POSIX compliance.  The fact that bash is used as /bin/sh
makes it very hard for me to turn on a non-standard behavior unless it is
explicitly requested by a configuration file such as /etc/profile, because
I write shell scripts that I expect to behave identically across multiple
platforms that claim some degree of POSIX conformance.

There is also the possibility of making bash turn igncr on by default, but
/bin/sh leaving it off, since only /bin/sh is specified by POSIX; but that
also gives me the willies thinking about people who will complain why
their script doesn't work when they change from #!/bin/bash to #!/bin/sh.

- --
Life is short - so eat dessert first!

Eric Blake             ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFQBxn84KuGfSFAYARArOCAKCjQMJbii2pMSKaz9k78ootxnVM/gCghNfR
whq++azHUYHn0crwPQAZlfs=
=9Ga1
-----END PGP SIGNATURE-----

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]