This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: grouping (was: if or template?)
- To: <xsl-list at mulberrytech dot com>
- Subject: Re: grouping (was: if or template?)
- From: "Steve Muench" <smuench at us dot oracle dot com>
- Date: Tue, 9 May 2000 01:10:53 -0700
- References: <39B19660C174D311BB9000A0C9E01C3F18BBCE@corfu.rnib.org.uk>
- Reply-To: xsl-list at mulberrytech dot com
| ><xsl:key name="tid" use="tracker-id" select="."/>
| >
| ><xsl:for-each
| >select="//tracker-id[generate-id(.)=generate-id(key('tid',.)[1])]">
| >
| >I hope Steve will forgive me for announcing this discovery
| >before he does,
| >I'm quite excited by it because it gives much better performance.
|
| All it does to me is make me scratch my head!
| Steve/Mike, would you give us the idiots view on this please,
| whats happening? *Why* does it provide the unique tracker-id please?
When you're doing grouping, you basically want to select
exactly one of each unique thing.
//tracker-id
would select all tracker-id elements in the document.
Declaring a 'tid' key like:
<xsl:key name="tid" use="tracker-id" select="."/>
The key('tid','tidvalue') function looks up all nodes having
tracker-id = 'tidvalue'. In order to support this lookup,
the processor will be keeping a list in memory like this:
"tid" Key lookup Table
======================
tracker-id Ref to tracker-id elements
value having that value
----------- --------------------------
abc123 node(109),node(344),node(496)
def456 node(15)
hij332 node(89),node(101)
Where the notation node(nnn) means "the node whose node-id is nnn"
as defined by generated-id(). To be concrete, the processor
is likely keeping some kind of Hashtable with the tracker-id
*value* as the hash key, and a node-list as the hash value.
//tracker-id[generate-id(.)=generate-id(key('tid',.)[1])]
selects all tracker-id elements in the document having
a node-id equal to the node-id of the first node in the
"key lookup table's" list of nodes having the current
tracker-id.
Said more simply, it selects the first tracker-id element
for each unique tracker-id value.
Or even more simply, it selects a list of distinct tracker-id values.
Here's an example.
Take the "Task.xml" File below...
<Tasks>
<Task><Desc>Task1</Desc><Owner>Steve</Owner></Task>
<Task><Desc>Task2</Desc><Owner>Mike</Owner></Task>
<Task><Desc>Task3</Desc><Owner>Dave</Owner></Task>
<Task><Desc>Task4</Desc><Owner>Steve</Owner></Task>
<Task><Desc>Task5</Desc><Owner>Mike</Owner></Task>
<Task><Desc>Task9</Desc><Owner>Mike</Owner></Task>
</Tasks>
The stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:key name="xxx" match="/Tasks/Task/Owner" use="."/>
<xsl:template match="/">
<Tasks>
<xsl:for-each select="/Tasks/Task/Owner[generate-id(.)=generate-id(key('xxx',.))]">
<xsl:sort select="."/>
<Owner name="{.}">
<xsl:for-each select="key('xxx',.)/..">
<xsl:copy-of select="."/>
</xsl:for-each>
</Owner>
</xsl:for-each>
</Tasks>
</xsl:template>
</xsl:stylesheet>
Produces a sorted, grouped list of tasks by owner and
is much faster than the equivalent "scan-my-preceding"
approach...
<?xml version = '1.0' encoding = 'UTF-8'?>
<Tasks>
<Owner name="Dave">
<Task>
<Desc>Task3</Desc>
<Owner>Dave</Owner>
</Task>
</Owner>
<Owner name="Mike">
<Task>
<Desc>Task2</Desc>
<Owner>Mike</Owner>
</Task>
<Task>
<Desc>Task5</Desc>
<Owner>Mike</Owner>
</Task>
<Task>
<Desc>Task9</Desc>
<Owner>Mike</Owner>
</Task>
</Owner>
<Owner name="Steve">
<Task>
<Desc>Task1</Desc>
<Owner>Steve</Owner>
</Task>
<Task>
<Desc>Task4</Desc>
<Owner>Steve</Owner>
</Task>
</Owner>
</Tasks>
For testing, here is a slower.xsl stylesheet that
does the same job without using the key() technique:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- Slower.xsl -->
<xsl:output indent="yes"/>
<xsl:template match="/">
<Tasks>
<xsl:for-each
select="/Tasks/Task[not(preceding-sibling::Task/Owner=./Owner)]/Owner">
<xsl:sort select="."/>
<xsl:variable name="owner" select="."/>
<Owner for="{.}">
<xsl:for-each select="/Tasks/Task[Owner = $owner ]">
<xsl:copy-of select="."/>
</xsl:for-each>
</Owner>
</xsl:for-each>
</Tasks>
</xsl:template>
</xsl:stylesheet>
as you scale up the size of the Task.xml input file, the performance
difference can be dramatic. Try copy/pasting the elements
in the Task.xml above to creates a couple thousand <Task>
elements to give it a spin...
______________________________________________________________
Steve Muench, Lead XML Evangelist & Consulting Product Manager
Business Components for Java & XSQL Servlet Development Teams
Oracle Rep to the W3C XSL Working Group
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list