This is the mail archive of the mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Accessing filenames with different charsets

> > Sorry if this has already been discussed, but I couldn't find it in the
> > archive nor in the FAQ...
> >
> > If I have a file name with Russian characters in it, cygwin is unable to
> > access it:
> >
> > > ls
> > ????.TEST
> >
> > (Russian characters are shown as '?' in directory listing, but ls does
> > the file).
> >
> > If I try to access it, however, open fails:
> >
> > > touch *
> > touch: '????.TEST': no such file or directory
> >
> > same deal with less, cp, rm, rsync etc.
> Okay, it seems cygwin readdir() returns the filenames as "????.TEST"
> ?:s are really ?:s (ascii 0x3f)). Looking at, this
> can't be caused by much else than by FindFirstFileA() returning
> And indeed, if made a little non-unicode test program, that called
> FindFirstFile, and it returned "????.TEST" ("\0x3f\0x3f\0x3f\0x3f.TEST").
> To access the file, the wide char versions of Find*File() functions would
> propably have to be used (or is there another way?). I can't no idea how
> could be integrated into the cygwin framework...
> Any ideas?
Qt (from Trolltech) encodes Unicode filenames before they are used. In
Cygwin we could do the reverse, i.e. use Find*FileW and then encode the
Unicode as a local ANSI string. If we do the encoding manually in Cygwin,
rather than let Windows do it for us, this would overcome the problem. I
will try to put together a patch for this that you can test. One possibility
is to encode Unicode strings as UTF-8.


Unsubscribe info:
Bug reporting:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]