This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ANSI encoding



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 14:41 22/5/02, Joel Konkle-Parker wrote:
>What's the <?xml version="1.0" encoding=""?> encoding="" string for
>ANSI?

You've already had a few answers addressing various facets of this; I hope 
this will also be useful.

ANSI, as Mike Kay pointed out, is a standards body.  Their best-known 
encoding is ASCII, whose identifier is "US-ASCII".  (The canonical charset 
name is "ANSI_X3.4-1968"; aliases include "ASCII" and "US-ASCII", which is 
preferred for MIME usage.)  ASCII is a 7-bit encoding, covering values from 
0 to 127; if you have any accented characters or other "weird" letters, you 
are not using ASCII.  Since ASCII is identical with UTF-8 for characters 
127 and below, and doesn't cover any other characters, you might as well 
leave the identifier out since UTF-8 is the default.

As others have mentioned, Windows sometimes calls its encoding 
"ANSI".  This is nonsensical, yet true.  If you are using a US or western 
European system, you are using Windows codepage 1252.  This is identical 
with the ISO western European encoding, ISO 8859-1, except for characters 
128-159 (which are control codes in ISO 8859-1 and are punctuation like the 
euro, ellipses, dagger, em dash, curly quotes in Windows CP 1252).  If you 
aren't using that middle range, use the label "ISO-8859-1"; if you are 
using that range, use the "windows-1252" label.  That's all if you're sure 
that you actually have an 8-bit encoding, and that the information hasn't 
been stored in UTF-8.  The easiest way to determine this is to open the 
document in a very stupid editor, or using "type" at the DOS prompt.  If 
your fancy schmancy euro-characters show up as single characters, it's an 
8-bit encoding; if they show up as sequences of multiple characters, 
usually starting with an accented A of some sort, then you're in UTF-8 and 
don't need a label.  If they show up as always two characters, the first of 
which is null, then it's UTF-16 and you still shouldn't need a label.

A complete list of IANA-registered identifiers can be found at <URL: 
http://www.iana.org/assignments/character-sets >.

[This is what happens when charset nerds drink too much espresso.]

~Chris

>-----BEGIN PGP SIGNATURE-----

P.S. Signing your message doesn't help when your public key isn't available 
from any of the usual places.
- -- 
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8

iQA/AwUBPOy3JaxS+CWv7FjaEQJy1QCbB1RoZtUWzQXVwDqBkopJ5jycg8YAmwdH
1NgVgikf5WevBGwg5AQmbnZn
=/+JM
-----END PGP SIGNATURE-----


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]