This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Extra spaces in text files in cygwin


gmarsha11 wrote:

I'm not sure about the file's encoding. How do I tell?

If you have "file" installed, its easy:


$ file Document.txt
Document.txt: Unicode text, UTF-16, little-endian

When I create a new file with vi, I can read the file with no problem.  The
output is normal.

Look at the bottom line, vi tells you what kind of "text" it is... sort of:


"Document.txt" [converted][dos] 1L, 20C

The "converted" means it wasn't regular text, the "dos" means it has CR-LF line endings.

If you like to look at what it really is, try:

$ od -tx2z Document.txt
0000000 feff 0054 0068 0069 0073 0020 0069 0073  >..T.h.i.s. .i.s.<
0000020 0020 0061 0062 0063 0020 0066 0069 006c  > .a.b.c. .f.i.l.<
0000040 0065 000d 000a                           >e.....<
0000046

So your spaces are really null bytes (some fonts put little smileys), vi was wrong no CR in there.

These particular text files that I am working with were created by HP Data
Protector.  I can easily parse and manipulate these files on HPUX servers,
but the Windows servers lack that functionality.  I thought Cygwin would
help with this.

How do I tell what the file's encoding is?

As pointed out by Gary Johnson, `cat Document.txt` doesn't result in spaced text, it just shows "ÿþThis is abc file" (this is using mrxvt and Bitstream Vera Sans mono font).


Better use the file command to see what it is. And no, there are no converting software that I know of, Cygwin 1.5.x just doesn't support wide characters.
--
René Berber



-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]