This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: [xsl] Re: [xsl] Handling of special characters like © etc


Hi Mike,
thanx for the detailed mail.
One more favour now.
I will be speaking abt second part of my mail i.e.& and < (and >, for
balance)  etc.
Here I want to escape thr' < and > characters when it representd node
i.e.<root> and suppose this appears as a part of text like this
<mail-sender><mike@skew.org></mail-sender> then only I want to get rid of <
and > characters.
How to differentiate this before sending it to XSLTProcessor.

Thanks,
Yogesh

-----Original Message-----
From: owner-xsl-list@lists.mulberrytech.com
[mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Mike Brown
Sent: Thursday, May 03, 2001 1:55 PM
To: xsl-list@lists.mulberrytech.com
Subject: [xsl] Re: [xsl] Handling of special characters like © etc


Yogesh Dare wrote:
> <?xml version="1.0"?>

Encoding is, roughly, the mapping of a repertoire of abstract characters
(units in a script for written language) to 1 or more code units (bytes,
usually). Your XML file exists with some kind of encoding, because it is,
after all, just a bunch of bits & bytes.

The encoding declaration in an XML document (the encoding="foo" part of
the <?xml ...?> line at the top) is an XML document's way of stating what
encoding it has. When you omit the encoding declaration, either UTF-8 or
UTF-16 are assumed, usually UTF-8.

>       © 2000 site.com

The copyright symbol is allowed in XML, but since you have implied that
your document is probably UTF-8 encoded, that symbol must be encoded as
the pair of bytes 0xC2 0xA9.

If this is giving you problems, then your file is not really UTF-8
encoded, and this is an error. Chances are, it is encoded as just the byte
0xA9, because your file was produced with iso-8859-1 or windows-1252
encoding. You should get a text editor that saves in different encodings,
rather than just your platform/OS default, and that has a hex mode so you
can see the actual bytes in the file. I use TextPad, from
http://www.textpad.com/

If you don't want to put the correct bytes in your file, you can either
correctly declare the encoding as iso-8859-1 or windows-1252, or you can
use &#169; or &#xA9; in your XML and XSLT documents, rather than the raw
characters.

> Now after parsing, the parser output is given to XSLTProcessor to apply
xsl
> on it.But there again I face problem for characters like &,<,> etc.
> Well I can actually replace these known characters by there equivalents
like
> for & i can put &amp; and so on.
> But I want some generic way to handle this.

& and < (and >, for balance) are XML markup characters. If you are using
them as character data, you must either escape them, or put them in a
CDATA section, if one is allowed there. This is a requirement of all XML
documents, including your source XML and the stylesheet.

   - Mike
____________________________________________________________________________
_
mike j. brown, software engineer at  |  xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA    |  personal:
http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]