This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Character encoding problem
- To: xsl-list at lists dot mulberrytech dot com
- Subject: [xsl] Character encoding problem
- From: Matt Gushee <mgushee at havenrock dot com>
- Date: Mon, 21 May 2001 11:26:50 -0600 (MDT)
- Reply-To: xsl-list at lists dot mulberrytech dot com
Hi, folks--
I'm developing a simple XSLT transformation for selecting languages
(English or Japanese) on a bilingual website. It takes a source XHTML
document with paired headings in English and Japanese, e.g.:
<p xml:lang="en">
[ some stuff in English ]
</p>
<p xml:lang="ja">
[ same content in Japanese ]
</p>
... and outputs everything in the selected language plus any content
that has no language specified. At least that's the theory. I've tried
processing it w/ (full) Saxon and 4XSLT's command line interfaces, but
keep getting errors:
Saxon:
$ saxon main.html i18n.xsl currentLanguage=en
Transform failed: =US-ASCII
The above 'saxon' is a simple shell script I wrote just to
save typing. It just invokes 'java com.icl.saxon.Whatever
[<args>]'.
4XSLT:
$ 4xslt -DcurrentLanguage=en main.html i18n.xsl
[ long stack trace ]
TypeError: argument(2) to filter() must be a sequence type
The 4XSLT error looks like a possible bug, but the Saxon output is
just plain puzzling. Where is 'US-ASCII' coming from? I edit the
source in EUC-JP, then convert it to UTF-8 or UTF-16 (same results
either way) using iconv.
So, can anybody give me a clue? Any leads would be much appreciated.
Matt Gushee
---- i18n.xsl ---------------------------------------------
<?xml version="1.0"?>
<!-- None of the commentings-out made any difference -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="currentLanguage" select="'en'"/>
<xsl:variable name="charEncoding">
<xsl:choose>
<xsl:when test="$currentLanguage='en'">iso-8859-1</xsl:when>
<xsl:when test="$currentLanguage='ja'">euc-jp</xsl:when>
<xsl:otherwise>utf-8</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:output method="html" encoding="$charEncoding"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<!-- <xsl:template match="*[lang($currentLanguage) or not(@xml:lang)]"> -->
<xsl:template match="*[lang($currentLanguage)]">
<xsl:copy>
<!-- <xsl:for-each select="@*[name() != 'id']"> -->
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
--- main.html [pre-conversion: euc-jp encoding] --------------
<?xml version="1.0" encoding="UTF-16"?>
<!--
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1//EN"
"/usr/local/share/xml/xhtml/xhtml11.dtd"
>
-->
<html xmlns="http://www.w3.org/1999/xhtml"
version="-//W3C//DTD XHTML 1.1//EN"
xml:lang="en">
<head>
<title>Welcome</title>
</head>
<body xml:lang="en">
<h1 xml:lang="en">Welcome</h1>
<h1 xml:lang="ja">ようこそ</h1>
<hr xmlns="http://www.w3.org/1999/xhtml"/>
<p xml:lang="en">
The Kaiwa Club is an informal group for people who want to practice
Japanese conversation. We welcome members at all levels of
proficiency.
</p>
<p xml:lang="ja">
会話倶楽部は日本語の会話を練習したい人のためのインフォーマルなグループで
ございます。レベルはかかわらず、新しい会員を大歓迎しております。
</p>
</body>
</html>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list