This is the mail archive of the
mailing list for the Cygwin project.
Re: Accessing filenames with different charsets
> > Sorry if this has already been discussed, but I couldn't find it in the
> > archive nor in the FAQ...
> > If I have a file name with Russian characters in it, cygwin is unable to
> > access it:
> > > ls
> > ????.TEST
> > (Russian characters are shown as '?' in directory listing, but ls does
> > the file).
> > If I try to access it, however, open fails:
> > > touch *
> > touch: '????.TEST': no such file or directory
> > same deal with less, cp, rm, rsync etc.
> Okay, it seems cygwin readdir() returns the filenames as "????.TEST"
> ?:s are really ?:s (ascii 0x3f)). Looking at fhandler_disk_file.cc, this
> can't be caused by much else than by FindFirstFileA() returning
> And indeed, if made a little non-unicode test program, that called
> FindFirstFile, and it returned "????.TEST" ("\0x3f\0x3f\0x3f\0x3f.TEST").
> To access the file, the wide char versions of Find*File() functions would
> propably have to be used (or is there another way?). I can't no idea how
> could be integrated into the cygwin framework...
> Any ideas?
Qt (from Trolltech) encodes Unicode filenames before they are used. In
Cygwin we could do the reverse, i.e. use Find*FileW and then encode the
Unicode as a local ANSI string. If we do the encoding manually in Cygwin,
rather than let Windows do it for us, this would overcome the problem. I
will try to put together a patch for this that you can test. One possibility
is to encode Unicode strings as UTF-8.
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html