This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Performance of using predicates vs key function in a large scale xml problem.
- To: <xsl-list at lists dot mulberrytech dot com>
- Subject: [xsl] Performance of using predicates vs key function in a large scale xml problem.
- From: "Yang" <sfyang at unisvr dot net dot tw>
- Date: Tue, 3 Jul 2001 18:29:50 +0800
- Reply-To: xsl-list at lists dot mulberrytech dot com
Hi,
I pick up a lot of xslt knowledge from xsl.list. It may be a fqa problem
get the associated output
based on the given $id condition through using a predicate something like
as;
<copy>
<copy-of select="$source[@id = $id]"/>
</copy>
This simple pattern can provide a perfect solution for a small scale of
problem.
However its speed performance becomes dramatically slower once applying it
to the
large scale of problem of thousand records.
I have recorded a comparison between using the predicate and using key
elements and presented in the list(msg01066.html),
and found out using key function is a much better solution.
Jeni feed back her favorable opinion in using a key on the same document
*multiple* times (msg01072.html).
Now I am going to share another real case with some of you interested and
hopefully to get your expert opinion.
The case is involved a.xml with about 2000 z:row records and b.xml with same
size of a.xml. The task is
1. normalize-space of each attributes of each z:row in a.xml
2. get a copy of attributes from b.xml and add them to a.xml based on the
common saleorderno attribute value.
First, using the predicates shown below, the process time is very slower.
<xsl:template match="@SalesOrderNo">
<xsl:variable name="sno" select="normalize-space(.)"/>
<xsl:attribute name="SalesOrderNo">
<xsl:value-of select="$sno"/>
</xsl:attribute>
<!-- using predicate is unacceptable slow when comparing with
payment-customer2.xsl where the key function is used instead-->
<xsl:apply-templates
select="$MSource[normalize-space(@SalesOrderNo)=$sno]/@CustomerCode"
mode="merge"/>
</xsl:template>
So change to key solution by using following major steps:
1 Develop a more direct relation from b.xml
<xsl:variable name="aa" >
<xsl:for-each select="$MSource">
<z:row>
<xsl:apply-templates select="@SalesOrderNo|@CustomerCode|@CustomerName"
mode="merge"/>
</z:row>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="originalDoc" select="msxsl:node-set($aa)"/>
2. Apply the key function
<xsl:for-each select="$originalDoc">
<xsl:variable name="kk" select="key('salesorderno',$sno)"/>
<xsl:attribute name="CustomerCode">
<xsl:value-of select="$kk/@CustomerCode"/>
</xsl:attribute>
<xsl:attribute name="CustomerName">
<xsl:value-of select="$kk/@CustomerName"/>
</xsl:attribute>
</xsl:for-each>
</xsl:template>
The speed to get the final solution is much faster.
Therefore it convinces to me to handle a large quantity of records, it will
be worthwhile to
know more about using key function and the data scope of current node,
rather than a simple
and easy understandable predicate pattern.
The complete listing is attached below.
sfyang
sfyang@unisvr.net.tw
<?xml version="1.0" encoding="big5"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema"
exclude-result-prefixes="s dt msxsl rs z">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="salesorderno" match="z:row" use="@SalesOrderNo"/>
<xsl:variable name="MSource" select="document('b.xml')//z:row"/>
<xsl:variable name="aa" >
<xsl:for-each select="$MSource">
<z:row>
<xsl:apply-templates select="@SalesOrderNo|@CustomerCode|@CustomerName"
mode="merge"/>
</z:row>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="originalDoc" select="msxsl:node-set($aa)"/>
<xsl:template match="/">
<xsl:variable name="rtf-zs">
<xsl:apply-templates select="//z:row" mode="n" />
</xsl:variable>
<xsl:variable name="zz" select="msxsl:node-set($rtf-zs)/z:row"/>
zz:<xsl:copy-of select="$zz"/>
</xsl:template>
<xsl:template match="z:row" mode="n">
<z:row>
<xsl:apply-templates select="@*|node()"/>
<xsl:apply-templates select="@SalesOrderNo"/>
</z:row>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="normalize-space(.)"/>
</xsl:attribute>
</xsl:template>
<xsl:template match="@SalesOrderNo">
<xsl:variable name="sno" select="normalize-space(.)"/>
<xsl:attribute name="SalesOrderNo">
<xsl:value-of select="$sno"/>
</xsl:attribute>
<xsl:for-each select="$originalDoc">
<xsl:variable name="kk" select="key('salesorderno',$sno)"/>
<xsl:attribute name="CustomerCode">
<xsl:value-of select="$kk/@CustomerCode"/>
</xsl:attribute>
<xsl:attribute name="CustomerName">
<xsl:value-of select="$kk/@CustomerName"/>
</xsl:attribute>
</xsl:for-each>
</xsl:template>
<xsl:template match="@*" mode="merge">
<xsl:attribute name="{name()}">
<xsl:value-of select="normalize-space(.)"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list