This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: rsync no longer preserves extended ASCII characters after 1.7 upgrade


> On 2009/12/27 7:56 PM, Adam Rosi-Kessel wrote:
>> But when I
>> view them from the linux box, they have scrambled accents -- either just
>> ?'s if I use ls (must be a terminal issue)

You probably haven't got a charset configured on the Linux box. In
that case, ASCII is assumed, and 'ls' prints anything outside that
charset as a '?'. You can force 'ls' to print all characters anyway
using the '--show-control-chars' option.

>> or different nonstandard
>> characters that aren't the right extended characters if I redirect the
>> output to a file and then view that.

That means you've got a character set mismatch between Cygwin (UTF-8)
and Linux (presumably ISO-8859-1). Please note that the ext3
filesystem and Unix filesystems in general have no concept of
character sets: filenames are just bytes. The interpretation of those
bytes is entirely up to applications, and you tell them what character
set to assume using the LANG or LC_CTYPE variables. So one way to fix
your mismatch is to specify e.g. LANG=en_US.UTF-8 on the Linux system.

>> I'm just trying to get back the behavior from before the upgrade. Thanks
>> for any suggestions.
>
> Assuming you upgraded from 1.5.x to 1.7.1, Cygwin has new default
> locale/charset settings that affect filename handling. Have a look at
> the Cygwin User Guide, specifically the page on Internationalization, here:
>
> http://cygwin.com/cygwin-ug-net/setup-locale.html
>
> I'm not sure what the default locale/charset was for 1.5.x, but for
> 1.7.1, it is "C.UTF-8".

1.5's default charset was the Windows default "ANSI" codepage (as
returned by the GetACP() function). On English systems, that's
codepage 1252, which is mostly identical with ISO-8859-1, except for
additional printable characters in the 0x80..0x9F range.

> You may be able to get the old behaviour back by
> setting LANG (or LC_ALL or LC_CTYPE) in Cygwin to match the
> locale/charset of your Linux system.

Yes, that's one way. Specifying e.g. 'LC_CTYPE=en_US rsync ...' (i.e.
a language without an explicit character set) will give you the ANSI
codepage.

But I think the --iconv option is the better way. Assuming you want to
stick with ISO-8859-1 on the Linux side, '--iconv utf8,iso88591'
should do the job.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]