This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: ANSI encoding
- From: "Christopher R. Maden" <crism at maden dot org>
- To: xsl-list at lists dot mulberrytech dot com
- Date: Thu, 23 May 2002 02:32:21 -0700
- Subject: Re: [xsl] ANSI encoding
- Reply-to: xsl-list at lists dot mulberrytech dot com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 14:41 22/5/02, Joel Konkle-Parker wrote:
>What's the <?xml version="1.0" encoding=""?> encoding="" string for
>ANSI?
You've already had a few answers addressing various facets of this; I hope
this will also be useful.
ANSI, as Mike Kay pointed out, is a standards body. Their best-known
encoding is ASCII, whose identifier is "US-ASCII". (The canonical charset
name is "ANSI_X3.4-1968"; aliases include "ASCII" and "US-ASCII", which is
preferred for MIME usage.) ASCII is a 7-bit encoding, covering values from
0 to 127; if you have any accented characters or other "weird" letters, you
are not using ASCII. Since ASCII is identical with UTF-8 for characters
127 and below, and doesn't cover any other characters, you might as well
leave the identifier out since UTF-8 is the default.
As others have mentioned, Windows sometimes calls its encoding
"ANSI". This is nonsensical, yet true. If you are using a US or western
European system, you are using Windows codepage 1252. This is identical
with the ISO western European encoding, ISO 8859-1, except for characters
128-159 (which are control codes in ISO 8859-1 and are punctuation like the
euro, ellipses, dagger, em dash, curly quotes in Windows CP 1252). If you
aren't using that middle range, use the label "ISO-8859-1"; if you are
using that range, use the "windows-1252" label. That's all if you're sure
that you actually have an 8-bit encoding, and that the information hasn't
been stored in UTF-8. The easiest way to determine this is to open the
document in a very stupid editor, or using "type" at the DOS prompt. If
your fancy schmancy euro-characters show up as single characters, it's an
8-bit encoding; if they show up as sequences of multiple characters,
usually starting with an accented A of some sort, then you're in UTF-8 and
don't need a label. If they show up as always two characters, the first of
which is null, then it's UTF-16 and you still shouldn't need a label.
A complete list of IANA-registered identifiers can be found at <URL:
http://www.iana.org/assignments/character-sets >.
[This is what happens when charset nerds drink too much espresso.]
~Chris
>-----BEGIN PGP SIGNATURE-----
P.S. Signing your message doesn't help when your public key isn't available
from any of the usual places.
- --
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8
iQA/AwUBPOy3JaxS+CWv7FjaEQJy1QCbB1RoZtUWzQXVwDqBkopJ5jycg8YAmwdH
1NgVgikf5WevBGwg5AQmbnZn
=/+JM
-----END PGP SIGNATURE-----
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list