This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [docbook-apps] Different encoding in XML and XSL.


In general, you can mix and match encodings without problems, because XML
processors convert whatever the original encoding was to Unicode internally.
That's why it is critically important that all XML documents indicate their
encoding (or they are taken to be UTF-8 by default). Once loaded as Unicode
in memory, the processor can write it out to whichever encoding you ask for.

The big caveat is that not all processors support conversion of all
encodings.  For example, according to its doc, the built-in AElfred parser
in Saxon 6.5 supports these incoming encodings: ISO-8859-1, 8859_1,
ISO8859_1, US-ASCII,  ASCII, UTF-8, UTF8,ISO-10646-UCS-2, UTF-16, UTF-16BE,
UTF-16LE

and it supports these outgoing encodings:

ascii, us-ascii, utf-8, utf8, utf-16, utf16, iso-8859-1, iso-8859-2
ko18-r, cp852, cp1250, windows-1250, cp1251, windows-1251

However,  if you substitute the Xerces parser in Saxon, you get a much
longer list of encodings:
http://xml.apache.org/xerces2-j/faq-general.html#faq-8

The following link will give you some general background on encodings with
regard to DocBook:
http://www.sagehill.net/docbookxsl/CharEncoding.html

Bob Stayton
Sagehill Enterprises
DocBook Consulting
bobs@sagehill.net


----- Original Message ----- 
From: "Rajal Shah" <rajal@meshsoftware.com>
To: "Docbook-Apps" <docbook-apps@lists.oasis-open.org>
Sent: Tuesday, March 09, 2004 3:10 PM
Subject: [docbook-apps] Different encoding in XML and XSL.


> This may a generic XSL question.. But I've hit upon it when evaluating
> docbook xsls.. So I'm posting it here..
>
> I'm evaluating if docbook can fit our needs here.. We probably will have
> our custom XSL which would include/import docbook xsls. The input XML to
my
> xsl can have varying encodings (charset).. So the question is:
>
> 1. How does the docbook xsl behave if the XML encoding is different from
the
> XSL..
>
> 2. I also see the localization xml files (en.xml) in the docbook-xsl
> distribution.. The encoding is set to US-ASCII.. So in effect, I could
have
> my XML document coming in as "windows-1252", the en.xml file would have
> encoding set to "US-ASCII" and my xsl will most likely be "UTF-8". How is
> the behavior determined in this case..
>
> The general question is, if someone could point to understand the XML/XSL
> processor behavior in handling various encodings, that would be immensely
> appreciated.
>
> Regards.
> --
> Rajal
>
>
>
> To unsubscribe from this list, send a post to
docbook-apps-unsubscribe@lists.oasis-open.org, or visit
http://www.oasis-open.org/mlmanage/.
>
>
>



To unsubscribe from this list, send a post to docbook-apps-unsubscribe@lists.oasis-open.org, or visit http://www.oasis-open.org/mlmanage/.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]