This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Unwrapping trees


David wrote:
>> I can't quite parse that at 7am. Uhm, but the rule is "pull the
>> nested anchors" to the top so they aren't nested anymore without
>> otherwise changing the markup to the greatest extent possible.
>
> I've tried this before (with presumably the same intent as you, to
> ensure that the html output of a stylesheet is well formed) but you
> need to "close all open elements on the stack" which isn't so easy
> in xslt at the best of times

I came across this problem as well, dealing with XML documents where
the abstract structure of the document (sections/paragraphs etc) is
represented with element nesting while the page breaks are represented
by processing instructions or empty elements. This has to be
transformed into a structure where there are page elements, in which
the content of the page is nested (with paragraphs etc. split if they
run over several pages).

I wondered about creating a result in which elements map directly onto
SAX events, something like:

<xsl:template match="/">
  <sax:startDocument />
  <xsl:apply-templates />
  <sax:endDocument />
</xsl:template>

<xsl:template match="text()">
  <sax:characters chars="{.}" />
</xsl:template>

<xsl:template match="*">
  <xsl:for-each select="namespace::*[name() != 'xml']">
    <sax:startPrefixMapping prefix="{name()}" uri="{.}" />
  </xsl:for-each>
  <xsl:apply-templates select="." mode="startElement" />
  <xsl:apply-templates />
  <xsl:apply-templates select="." mode="endElement" />
  <xsl:for-each select="namespace::*[name() != 'xml']">
    <sax:endPrefixMapping prefix="{name()}" uri="{.}" />
  </xsl:for-each>
</xsl:template>

<xsl:template match="*" mode="startElement">
  <sax:startElement xsl:use-attribute-sets="name">
    <sax:attributes>
      <xsl:for-each select="@*">
        <sax:attribute value="{.}" xsl:use-attribute-sets="name" />
      </xsl:for-each>
    </sax:attributes>
  </sax:startElement>
</xsl:template>

<xsl:template match="*" mode="endElement">
  <sax:endElement xsl:use-attribute-sets="name" />
</xsl:template>

<xsl:attribute-set name="name">
  <xsl:attribute name="name">
    <xsl:value-of select="name()" />
  </xsl:attribute>
  <xsl:attribute name="local-name">
    <xsl:value-of select="local-name()" />
  </xsl:attribute>
  <xsl:attribute name="namespace-uri">
    <xsl:value-of select="namespace-uri()" />
  </xsl:attribute>
</xsl:attribute-set>

Then you could deal with a problem like closing all necessary
currently open elements with:

<xsl:template match="a//a">
  <xsl:variable name="from" select="ancestor::a" />
  <xsl:variable name="wrappers"
    select="ancestor::*[not(self::a) and 
                        generate-id(ancestor::a) = generate-id($from)]" />

  <xsl:apply-templates select="$from[last()] | $wrappers"
                       mode="endElement">
    <xsl:sort select="position()" order="descending" />
  </xsl:apply-templates>

  <xsl:apply-templates select="$wrappers" mode="startElement" />
  
  <xsl:apply-templates select="." mode="startElement" />
  <xsl:apply-templates />
  <xsl:apply-templates select="." mode="endElement" />

  <xsl:apply-templates select="$wrappers" mode="endElement">
    <xsl:sort select="position()" order="descending" />
  </xsl:apply-templates>

  <xsl:apply-templates select="$from[last()] | $wrappers"
                       mode="startElement" />
</xsl:template>

Then you can write a serialiser that takes the elements in the SAX
namespace and generates a SAX event for each of them in order to
serialise the document. Of course this way it's possible to create
non-well-formed output, but using elements and a custom serialiser
seemed cleaner than either using disable-output-escaping or a
stylesheet generating text, on a par with the lexical representation
of document types and entities suggested in XSLT 2.0.

Norm: you can adapt the templates above to create serialised XML
rather than sax:* elements, and put them in a stylesheet with a text
output method to do the transformation that you're after. I'm afraid
that it does create empty span elements (or rather, span elements that
only have whitespace text node children), but it's not difficult to
create a filter to take those out of the result...

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]