This is the mail archive of the cygwin mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 05/14/2015 10:32 AM, Vince Rice wrote: > locale run from a cmd.exe session says that everything is âC.UTF-8â, while locale run from mintty says that everything is en_US.UTF-8. A âwhichâ in both cases shows that the locale being run is cygwinâs, so I assume mintty does something slightly differently than the normal console? I donât even know if thereâs a difference. (Have I mentioned I donât know anything about all of this?) > > From cmd.exe: > LANG= > LC_CTYPE="C.UTF-8" > LC_NUMERIC="C.UTF-8" > LC_TIME="C.UTF-8" > LC_COLLATE="C.UTF-8" > LC_MONETARY="C.UTF-8" > LC_MESSAGES="C.UTF-8" > LC_ALL= That's because all programs default to C unless told otherwise; from cmd, there is nothing stating otherwise, as each cygwin command is the first process in its own tree of processes. > > From mintty > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_ALL= mintty is a cygwin process, AND it sets your locale variables to match your Windows locale, then all other processes are children of mintty and get the preferred locale settings by default. Of course, if you don't like mintty's defaults, you can set up your shell initialization scripts to change it to your preference. > > Now, pardon my continued ignorance, but which of those variables needs to be set to UTF16 in order for grep to work? And I assume it (they?) should be set to en_US.UTF-16? None. UTF16 is not a valid locale. It is a valid encoding (wide character), but locales must operate on multi-byte sequences, not wide characters. So you HAVE to convert from wide character to multi-byte before you can do anything that requires a locale to work correctly. > > Thanks to everyone for your help. I think youâve all confirmed this isnât cygwin-specific, but I couldnât find anything even searching generically (âgrep unicodeâ and now âgrep utf16â). I did finally find an external reference to iconv, but if grep is supposed to be handle this natively, I havenât been able to find much on how to do it. grep cannot handle UTF16 natively. iconv exists to do encoding transformations, so that the rest of the system can live in multi-byte world instead of worrying about wide-character encodings. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |