This is the mail archive of the
docbook@lists.oasis-open.org
mailing list for the DocBook project.
Re: XML to XML and entities
- To: docbook at lists dot oasis-open dot org
- Subject: Re: DOCBOOK: XML to XML and entities
- From: Norman Walsh <ndw at nwalsh dot com>
- Date: Sat, 13 Jan 2001 08:44:09 -0500
- References: <51.60bca2e.27919fca@aol.com>
/ FredaAnces@aol.com was heard to say:
| I am doing an XML to XML transformation.
[With my moderator hat on, I want to remind everyone that the appropriate
DocBook list for stylesheet and other application-related questions is
docbook-apps@lists.oasis-open.org.]
| If I put the following into my XSL file, entities such as — create
| gibberish characters in the new XML file:
|
| <xsl:output method="xml" />
I think with closer inspection you'll find that they aren't gibberish,
they're UTF-8 representations of the Unicode characters that those
entities represent. For example, an mdash is Unicode character 8212.
The only way to represent that in UTF-8 is with a multi-byte sequence
of octets. That sequence, when viewed in a tool that does not understand
UTF-8 encodings appears as three upper-ASCII characters.
| If I replace the output method with "html" the entities work fine but the
| processing instructions no longer have the correct format. For example - I
| get the following line without the ? for the closing tag:
|
| <?xml:stylesheet type="text/xsl" href="ae_toc.xsl">
Right. When you asked for HTML, you told the processor to output HTML,
which is in ISO-Latin1, so entities have to be used for special
characters, and PIs have the SGML form.
| Any ideas? Thanks very much, Freda
I'm not sure what you want. The first form, in UTF-8, should be
understandable to any XML processor. The second form isn't XML.
Be seeing you,
norm
--
Norman Walsh <ndw@nwalsh.com> | Any idiot can face a crisis; it's
http://www.oasis-open.org/docbook/ | this day-to-day living that wears
Chair, DocBook Technical Committee | you out.--Anton Chekhov