This is the mail archive of the cygwin mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 03/21/2016 07:40 AM, Gordon Grimes wrote: > Hi, > > I had generated a FILE by simply doing a 'find' on a directory and used grep to cull the results. I wasn't working so I repeated and tried the following trivial 'grep': > > % wc -l FILE > 48786 > % grep . FILE > 2240 > > Very wrong. Umm, grep doesn't output counts unless you use 'grep -c'. Also, one of the big changes in recent grep is more efficient handling of encoding errors; remember, the regular expression '.' is only supposed to match valid characters, and that an encoding error can cause grep to quit checking; so in all likelihood, your problem stems from the fact that the contents of FILE contain an encoding error in your current locale. But, as others have already pointed out, you didn't post a simple reproducible example for us to confirm, nor tell us what locale you are using, nor tell us whether you have tried LC_ALL=C to see if forcing a single-byte locale with no encoding errors cleans up the problem. So, as the grep maintainer, I'm awaiting proof that there is a problem (or confirmation that the bug is on your end, and not in grep) before I worry about putting out another build of grep. Something like this is repeatable: $ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 wc -l 3 $ printf 'a\n\x80\nc\n' | LC_ALL=en_US.UTF-8 src/grep -c . 2 $ printf 'a\n\x80\nc\n' | LC_ALL=C src/grep -c . 3 Note how wc counts \n characters, regardless of encoding errors elsewhere, while grep -c skips the \x80 line because it contains nothing but encoding errors in UTF-8. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |