This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: International Characters in attributes


> But once you get into the areas of the                                
> BMP where utf-8 starts producing the "transformations" that the "t"   
> stands for, with 3 or even 5-byte sequences, none of the browsers I've
> looked at will behave 100% properly (and some XML parsers and XSLT
> engines can hiccup as well).
  
I would love to see a summary of your test results in this area!
  
> That's partly why people still use encodings other than utf-8. And
> once you do, the same numeric character references will mean different
> things in different encodings, (there aren't named entities in html   
> for the 20,000+ Chinese characters) and so show differently in the    
> browser. 
 
That really *shouldn't* be the case, although I believe some of the old
pre-1996 browsers did exhibit this behavior. A numeric character reference
is by definition, at least in XML and HTML, a reference to a code point in
the ISO/IEC 10646 coded character set. It should never change with the
encoding of the document containing the reference. e.g., ¦ means the
BROKEN BAR character, always, even though code point 166 in, say, ISO
8859-2 means LATIN CAPITAL LETTER S WITH ACUTE.

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at            My XML/XSL resources: 
webb.net in Denver, Colorado, USA              http://skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]