This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Certain chars break transformation process. Guru wanted!
- To: xsl-list at mulberrytech dot com
- Subject: Re: Certain chars break transformation process. Guru wanted!
- From: Mike Brown <mike at skew dot org>
- Date: Wed, 23 Aug 2000 15:45:14 -0600 (MDT)
- CC: braunonline at hotmail dot com
- Reply-To: xsl-list at mulberrytech dot com
Braun Online wrote:
> Where have I gone wrong? Whenever I try to perform an XSLT transformation
> on an XML file which has the registered symbol (?), the XSLT programes read
> the symbol as two characters (? and ?)
Your XML file is using an encoding that is different from what the XML
parser (which feeds info about the document to the XSLT processor) thinks
it has. Is there an encoding="..." specification in the <?xml ...?> line
at the beginning of the file? What does it say? What created the file? Was
it a simple text editor that didn't give you the option of selecting an
encoding/character set to use?
Most likely what you see as the circle-R in your editor is stored on disk
as a single byte, 0xAE, which is how that character is represented in
iso-8859-1 and cp1252. The XML parser is following rules outlined in the
XML spec for deciding what character set was used to encode that XML, and
is decoding the bytes accordingly. It is probably deciding utf-8 is the
character set that was used.
You must change the XML file to declare the correct encoding, as it is an
error for a document to declare the wrong encoding. However you should
note that XML parsers are only required to support utf-8 and utf-16, so it
may be necessary to check your parser's documentation to see whether it
supports the actual encoding of the document.
Aside from that, choose one:
- Change the encoding of the XML file using a tool like Free Recode
so that it matches what the file delcares its encoding to be
- Change the XML file to use character references instead of literal
characters, for characters that are outside the ASCII range (0x20-0x7E)
... for example, ® Note that such references are for the ISO/IEC
10646-1 universal character set, commonly though not accurately
thought of as Unicode) ... this way your XML file will be entirely
ASCII bytes, and since ASCII is a subset of UTF-8, it will be fine
if the parser interprets it as UTF-8.
- Mike
____________________________________________________________________
Mike J. Brown, software engineer at My XML/XSL resources:
webb.net in Denver, Colorado, USA http://www.skew.org/xml/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list