This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: International Characters in attributes
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] International Characters in attributes
- From: Mike Brown <mike at skew dot org>
- Date: Sun, 11 Feb 2001 17:21:45 -0700 (MST)
- Reply-To: xsl-list at lists dot mulberrytech dot com
> But once you get into the areas of the
> BMP where utf-8 starts producing the "transformations" that the "t"
> stands for, with 3 or even 5-byte sequences, none of the browsers I've
> looked at will behave 100% properly (and some XML parsers and XSLT
> engines can hiccup as well).
I would love to see a summary of your test results in this area!
> That's partly why people still use encodings other than utf-8. And
> once you do, the same numeric character references will mean different
> things in different encodings, (there aren't named entities in html
> for the 20,000+ Chinese characters) and so show differently in the
> browser.
That really *shouldn't* be the case, although I believe some of the old
pre-1996 browsers did exhibit this behavior. A numeric character reference
is by definition, at least in XML and HTML, a reference to a code point in
the ISO/IEC 10646 coded character set. It should never change with the
encoding of the document containing the reference. e.g., ¦ means the
BROKEN BAR character, always, even though code point 166 in, say, ISO
8859-2 means LATIN CAPITAL LETTER S WITH ACUTE.
- Mike
____________________________________________________________________
Mike J. Brown, software engineer at My XML/XSL resources:
webb.net in Denver, Colorado, USA http://skew.org/xml/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list