This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Fwd: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line]


Repost for mailing list.

On Sat, May 30, 2009 at 6:03 PM, Edward Lam <edward@sidefx.com> wrote:
>> Here, when I use russian Windows and I don't have LANG set (or when I
>> have LANG=en_US.UTF-8), filename will be utf-8 multibyte string. So
>> both, russian and european/chinese/japanese filenames will be valid.
>> Now there are three possibilities:
> How does the filename get to be a utf-8 multibyte string if you created
> the filename from an ANSI application? Since it sounds like Russian
> Windows uses a code page different from UTF-8.

When you create a file from ansi application, Windows converts
filenames to unicode, using your system code page. Cygwin 1.7 uses
unicode. Cygwin converts filenames to multibyte when it communicates
with Cygwin applications, and converts to unicode when it accepts data
from Cygwin applications. When LANG is not set it is currently utf-8
(but could be anything arbitrary, I'm just glad that it's utf-8
because it converts data back and forth without losing characters and
there are no problems with SO-UTF8). So cygwin applications work with
utf-8 filenames, and console is utf-8, and cygwin communicates with
Windows via unicode. But multibyte encoding is overridable via
LC_ALL/LANG.

When you are executing windows applications it's natural that you
either pass filenames or some text. Since without LANG set Cygwin
multibyte encoding is utf-8 it's only natural to use utf-8 to convert
arguments to unicode when executing windows applications. After all,
if you have a utf-8 filename with japanese characters it's only
natural that "cmd.exe /c del /y $filename" and "cmd.exe /c echo
$sometext" will succeed for any text that uses current cygwin
encoding.

Think of it like this: since file is being read by cygwin in your
first email your copyright.txt had a wrong encoding. So you need to
either use iconv to convert it (I hope that `iconv -c -f cp1251 ...`
will do the right thing without specifying target encoding here), or
specify LANG to what you are working with right now.

And if you are using English windows with English regional settings,
then your LANG should be en_US.CP1252, not en_US.ISO-8859-1 (CP1252 is
what your windows applications are using!).

I really don't know how to better explain all this, since in my head
it's so clear and obvious. :-/

> Ok, so where's the bug tracker so I can log a bug?

Isn't this mailing list serving as bug tracker? I just hope that
whoever can fix this is reading our emails and will come up with the
right solution.



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]