This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Speed and memory problems when transforming links


> I'd like to merge the contents of two linked XML files into on file
> but I have speed and memory problems with Saxon as well as Xalan.
>
> The first XML file contains syllables and links to the second file,
> which contains phones:
>
>  <syllable_file>
>   <syllable id="sllbl_0">
>    <link href="phone.xml#phn_0" />
>    <link href="phone.xml#phn_1" />
>   </syllable>
>   ...
>  </syllable_file>
>
>  <phone_file>
>   <phone id="phn_0" />
>   <phone id="phn_1" />
>   ...
>  </phone_file>

Ouch! I think you've found a bug. When you use the document() function, you
are guaranteed to get a distinct tree for each distinct URI that you supply.
According to the spec, this is the URI *minus any fragment identifier*. But
it seems Saxon isn't stripping the fragment identifier before doing the
comparison, so you get a distinct instance for each URI Reference (i.e.
combination of URI and fragment identifier).

I'll fix this, but meanwhile, try:

     <xsl:for-each select="link">
       <xsl:variable name="frag" select="substring-after(@href, '#')"/>
       <xsl:for-each select="document(substring-before(@href, '#'))">
          <xsl:for-each select="id($frag)">
             ...

This will also be more portable: the XSLT 1.0 spec leaves it up to the
implementation how to interpret fragment identifiers in the URI reference.
(Which is why not many people use the facility, which is why you've found a
bug that's present in both products, by the looks of it!)

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com

>
> I use the following stylesheet to merge both files:
>
>  <?xml version="1.0" ?>
>  <!-- Merge the syllable and the phone file -->
>  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>   version="1.0">
>
>   <xsl:output method="xml" indent="yes" />
>
>   <xsl:template match="/">
>    <xsl:element name="syllable_phone_file">
>     <xsl:apply-templates select="/syllable_file" />
>    </xsl:element>
>   </xsl:template>
>
>   <xsl:template match="syllable">
>    <!-- Copy the syllable element and its attributes -->
>    <xsl:copy>
>     <xsl:copy-of select="@*" />
>     <!-- Follow the links -->
>     <xsl:for-each select="link">
>      <xsl:for-each select="document(@href)/*">
>       <!-- Copy the phone element and its attributes -->
>       <xsl:copy>
>        <xsl:copy-of select="@*" />
>       </xsl:copy>
>      </xsl:for-each>
>     </xsl:for-each>
>    </xsl:copy>
>   </xsl:template>
>
>  </xsl:stylesheet>
>
> The result looks like this:
>
>  <syllable_phone_file>
>   <syllable id="sllbl_0">
>    <phone id="phn_0"/>
>    <phone id="phn_1"/>
>   </syllable>
>   ...
>  </syllable_phone_file>
>
> As long as there are only a few syllables the transformation works.
> But if there are 1000 syllables and 2000 phones, Saxon and Xalan
> require a lot of memory and the transformation gets very slow.
>
> In my opinion 1000 records in an XML file aren't that much.  Both XML
> files require less than 150 KBytes on disk.  It's hard to believe that
> transforming 150 KBytes of data requires more than 100 MB of RAM.
>
> Is there a better way to handle links in stylesheets?
>
> Is there anything I can do to reduce the memory usage and to speed up
> the transformation?
>
> I've put a small archive that contains the stylesheet, a README and
> all the other files required to test the stylesheet with Saxon and
> Xalan at the following place:
>
> http://www-stud.ims.uni-stuttgart.de/~voegelas/linktest.zip
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]