This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: [docbook-apps] Different encoding in XML and XSL.
- From: "Bob Stayton" <bobs at sagehill dot net>
- To: "Rajal Shah" <rajal at meshsoftware dot com>, "Docbook-Apps" <docbook-apps at lists dot oasis-open dot org>
- Date: Tue, 9 Mar 2004 18:53:18 -0800
- Subject: Re: [docbook-apps] Different encoding in XML and XSL.
- References: <OBELIPNDODLCFINONIHPCEJPCOAA.rajal@meshsoftware.com>
In general, you can mix and match encodings without problems, because XML
processors convert whatever the original encoding was to Unicode internally.
That's why it is critically important that all XML documents indicate their
encoding (or they are taken to be UTF-8 by default). Once loaded as Unicode
in memory, the processor can write it out to whichever encoding you ask for.
The big caveat is that not all processors support conversion of all
encodings. For example, according to its doc, the built-in AElfred parser
in Saxon 6.5 supports these incoming encodings: ISO-8859-1, 8859_1,
ISO8859_1, US-ASCII, ASCII, UTF-8, UTF8,ISO-10646-UCS-2, UTF-16, UTF-16BE,
UTF-16LE
and it supports these outgoing encodings:
ascii, us-ascii, utf-8, utf8, utf-16, utf16, iso-8859-1, iso-8859-2
ko18-r, cp852, cp1250, windows-1250, cp1251, windows-1251
However, if you substitute the Xerces parser in Saxon, you get a much
longer list of encodings:
http://xml.apache.org/xerces2-j/faq-general.html#faq-8
The following link will give you some general background on encodings with
regard to DocBook:
http://www.sagehill.net/docbookxsl/CharEncoding.html
Bob Stayton
Sagehill Enterprises
DocBook Consulting
bobs@sagehill.net
----- Original Message -----
From: "Rajal Shah" <rajal@meshsoftware.com>
To: "Docbook-Apps" <docbook-apps@lists.oasis-open.org>
Sent: Tuesday, March 09, 2004 3:10 PM
Subject: [docbook-apps] Different encoding in XML and XSL.
> This may a generic XSL question.. But I've hit upon it when evaluating
> docbook xsls.. So I'm posting it here..
>
> I'm evaluating if docbook can fit our needs here.. We probably will have
> our custom XSL which would include/import docbook xsls. The input XML to
my
> xsl can have varying encodings (charset).. So the question is:
>
> 1. How does the docbook xsl behave if the XML encoding is different from
the
> XSL..
>
> 2. I also see the localization xml files (en.xml) in the docbook-xsl
> distribution.. The encoding is set to US-ASCII.. So in effect, I could
have
> my XML document coming in as "windows-1252", the en.xml file would have
> encoding set to "US-ASCII" and my xsl will most likely be "UTF-8". How is
> the behavior determined in this case..
>
> The general question is, if someone could point to understand the XML/XSL
> processor behavior in handling various encodings, that would be immensely
> appreciated.
>
> Regards.
> --
> Rajal
>
>
>
> To unsubscribe from this list, send a post to
docbook-apps-unsubscribe@lists.oasis-open.org, or visit
http://www.oasis-open.org/mlmanage/.
>
>
>
To unsubscribe from this list, send a post to docbook-apps-unsubscribe@lists.oasis-open.org, or visit http://www.oasis-open.org/mlmanage/.