This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: XSL and international characters
- From: Mike Brown <mike at skew dot org>
- To: xsl-list at lists dot mulberrytech dot com
- Date: Tue, 4 Dec 2001 13:33:40 -0700 (MST)
- Subject: Re: [xsl] XSL and international characters
- Reply-to: xsl-list at lists dot mulberrytech dot com
Marcin K_os wrote:
> Well, I agree that those are characters are in UTF-8 and that I wanted
> characters in UTF, the problem is that I passed as parameter one two-bye
> character and each byte of those two was transformed again into two-byte
> characters giving in result four bytes i.e., two two-byte characters.
>
> Orginal character was %C5%82 and the result was Å - one character and
> ‚ - second character :(
Everyone else seems to have missed your point. You are running into
an issue with an underspecified part of the URI, HTTP and HTML specs:
there is no standard mechanism for declaring what encoding is being
used when representing non-ASCII characters (x80 and above) in the
%-escaped format used in URIs and HTML form data submissions.
Tomcat interprets %C5%A2 in the HTTP request as bytes C5 A2, and
exposes them through the Java/JSP API as 2 chars in a String
according to an assumed (and probably wrong) iso-8859-1 encoding.
On the receiving end, you must convert these chars back into bytes,
assuming iso-8859-1, and then convert them to a String again, this
time assuming UTF-8. I did this in JSPs with WebLogic a while back,
and it was pretty straightforward. I'm not sure how it works with
your particular Tomcat/Cocoon setup, though.
- Mike
____________________________________________________________________________
mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list