This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: sed doesn't like LANG= anymore


Am 20.05.2010 18:05, schrieb Andy Koppe:
On Thursday, May 20, 2010, Jurriaan wrote:
A very long sed script that's been working for ages (back from the 1.5
age) here has stopped working.

It turned out sed doesn't like some strings anymore when environment
variable LANG is empty. With LANG=ASCII, there are no problems.

The actual text in the SED command is shown below as spaces, but it's a
Swedish a with a small o on top of it, like this:

sed -e"s/@a/ a/g;"

where a is character 0xe5.

Running with LANG=ASCII works, with LANG empty I get 'unterminated `s'
command' from sed (which confused me for a while).
With empty LANG you're using the default UTF-8 encoding, where that
0xe5 byte constitutes an incomplete character. You need to either run
with a LANG setting that fits your script, e.g. C.ISO-8859-1, or
convert your script to UTF-8. I'm puzzled as to why LANG=ASCII would
have worked, since that's not a valid setting.
With LANG=anything-unknown, the charmap is set to ASCII, so it works (as there is at least no multibyte character then).
Considering the described effect, I doubt that a UTF-8 decoder should swallow an ASCII byte after an incomplete UTF-8 sequence;
it should rather stop at the last UTF-8 sequence byte, and consider any subsequent initial UTF-8 or ASCII byte as a new character.
I guess the script would still work on Linux (can't try right now, sorry) even in a "wrong" locale, so I think something should be fixed in the newlib conversion functions here.
------
Thomas


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]