This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Transforming HTML to NITF
- To: Adam dot Hoven at bluezone dot net
- Subject: [xsl] RE: Transforming HTML to NITF
- From: Dimitre Novatchev <dnovatchev at yahoo dot com>
- Date: Sat, 17 Feb 2001 04:39:12 -0800 (PST)
- Cc: xsl-list at lists dot mulberrytech dot com
- Reply-To: xsl-list at lists dot mulberrytech dot com
Hi Adam,
Here's the solution to the problem:
The input xml doc:
-----------------
<body>
<p> this is some text</p>
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b><br/><br/>
<p>This is a new paragraph</p>
</body>
The stylesheet:
--------------
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template
match="node()[not(self::p or self::table or self::ul or self::ol or self::body)
and not(ancestor::*[self::p or self::table or self::ul or self::ol])]">
<xsl:choose>
<xsl:when test="position()=1 or not(preceding-sibling::node()[1]
[not(self::p or self::table or self::ul or self::ol)])">
<p>
<xsl:copy-of select="."/>
<xsl:variable name="endOfGroup"
select="(following-sibling::node()[self::p or self::table or self::ul or self::ol])[1]"/>
<xsl:variable name="endGroupPosition"
select="count($endOfGroup/preceding-sibling::node())"/>
<xsl:apply-templates mode="following"
select="following-sibling::node()
[count(preceding-sibling::node()) < $endGroupPosition]"/>
</p>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template mode="following" match="node()">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="node()">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
The result:
----------
<body><p> this is some text</p><ul>
<li>item 1</li>
</ul><p>
this is <em>emphasis</em> some more <b>text</b><br /><br /></p><p>This is a new
paragraph</p></body>
Hope this helped.
Cheers,
Dimitre Novatchev.
Adam Van Den Hoven wrote:
Since the body of NITF (News Industry Text Format, a standard format for
News content) is alot like HTML (in the simplest form), I'm allowing my
users to create NITF using an HTML parser. I then pass the HTML through HTML
Tidy to make it well formed XML and then through an XSL to make it NITF.
I have come across a problem that I dont know how to fix and I need the
communities help.
the NITF has a <content.body> tag which is equivilant to HTMLs <body> tag.
However, its children are far more rigidly defined in that it only allows
elements as children. For my purposes, I'm allowed <p> <table> <ul> and <ol>
tags (there are others but we don't use them yet).
After passing the HTML through HTML Tidy, I might get something like:
<body>
<p> this is some text</p>
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></br></br>
<p>This is a new paragraph</p>
</body>
This would occur if I started with:
<body>
<p> this is some text
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></p>
<p>This is a new paragraph</p>
</body>
> I need to get the line:
this is <em>emphasis</em> some more <b>text</b></br></br>
> to end up wrapped in <p> tags (preferably without the <br>s)
>
> For clarity, the children of the body are:
p
ul
| text()
| em
| text()
| b
| br
| br
p
> I need to work with thos tags that have the | beside them as a single
> block so that I can wrap the entire thing in a <p> tag. Since I don't know
> the placement or the order or even the frequency of such situations (there
> is no reason why I couldn't have more blocks that need to be grouped
> together). The solution needs to be general.
>
__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list