This is the mail archive of the
mailing list for the Cygwin project.
Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console
- From: Brian Inglis <Brian dot Inglis at SystematicSw dot ab dot ca>
- To: cygwin at cygwin dot com
- Date: Fri, 30 Sep 2016 23:15:02 -0600
- Subject: Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console
- Authentication-results: sourceware.org; auth=none
- References: <email@example.com> <f4712f19-ef37-2040-1cda-3e352f09c8cd@SystematicSw.ab.ca>
- Reply-to: Brian dot Inglis at SystematicSw dot ab dot ca
On 2016-09-30 22:34, Brian Inglis wrote:
On 2016-09-30 20:13, Ivan Vanyushkin wrote:
Something has changed in version 2.6.0, and now UTF-8 text can't be displayed in Windows console (cmd).
1. Create a file "test.txt" with non-ASCII text in UTF-8 encoding.
2. Run "cmd".
▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒ ▒▒▒▒▒▒ 8000 ▒▒. ▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒.
Non-ASCII text is not readable. Older Cygwin 2.5.2 has no such issue.
CYGWIN_NT-10.0 PCName 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin
Same issue with any other commands like "grep", or with utilities built and run under Cygwin 2.6.0.
Same issue in other Windows consoles, like ConEmu or FAR Manager.
If I change Windows console encoding to UTF-8 (run: "chcp 65001"), file can be correctly displayed natively
(run: "type test.txt"), but Cygwin "cat" still has the same issue.
How should I display UTF-8 now?
No problems here - same setup.
Don't have files containing UTF-8 specials handy, but do have with Latin1 (ISO-8859-1) specials,
convertable to UTF-8.
Stripped common ASCII-only lines from output below.
Default email encoding is Unicode (hopefully UTF-8) not Western (presumably Latin1), so should render accurately.
$ uname -srvmo
CYGWIN_NT-10.0 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin
$ egrep -a 'Deg|LF' latin1.txt # -a needed to override binary assumption - garbled characters
Y2LF='%s▒%s %s %s'
$ iconv -f iso-8859-1 -t utf-8 latin1.txt | egrep 'Deg|LF' # good utf-8 characters
Y2LF='%s±%s %s %s'
Sorry - this was mintty - you used cmd!
Saw similar problems you had until I set LC_ALL=C.UTF-8 (and LANG for consistency, but doesn't really matter) and chcp 65001.
Then type and Cygwin commands produce the same output.
Without CP65001 (and a Unicode console font mapping most characters - I use DejaVu Sans Mono everywhere I can) there may be no valid encoding for UTF-8 special characters in your default console CP (437 for US, 850 for non-US, others for localized versions).
Unfortunately then less displays spaces as squares, so you may have to set PAGER=more for readability.
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple