This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line


Corinna Vinschen wrote:
On May 29 17:21, Edward Lam wrote:

I think the problem I'm running into is: - I give cygwin 1.7's bash
a string that is in my system default code page. - cygwin 1.7
thinks the string is actually UTF-8 and tries to convert it as
UTF-8 into UTF-16, resulting in a truncated command line that is passed to child process.

The question is, what do you expect? I know, you expect that it "just works", but that's not as easy as you might assume, unfortunately.

Yes, Alexey and I had a lengthy argument on this thread already. Disagreements on the default LANG behaviour notwithstanding, I think that it still should NOT truncate, substituting the invalid character with something else instead.

Here's a quote from Alexey previously on this thread:

"In my opinion: truncation is a bug (should use replacement character,
or fail exec altogether), expecting utf-8 is not"

Wikipedia has several suggestions on how to handle invalid UTF-8 byte sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the rule that uses the replacement character.

Yoy get the idea.  The character 0xa9 has no meaning in itself.  It
only has a meaning when you consider the character set or codepage in
which you use this character.
...
> How is anybody supposed to know that the file which consists
> of the single byte 0xa9 has *any* meaning at all?  Why should it be
> the copyright sign, of all things?

What I was attempting to do was to have NO conversion. In the
real case that I into this, the "bug.exe" was the one to properly
interpret what the byte 0xA9 meant from the command line. Yes, I know
there are several workarounds.

If we default to the ANSI codepage, you will have the same problem,
just upside down.  In both cases you will have even more problems if
you start using characters not available in your default codepage.

This is where I disagreed with Alexey. What we're really arguing here is whether which default will run into the least problems for the most common usage. This is subjective of course.


-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]