This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Simple problem - complicated solution


The solutions that Joerg presented work correctly, but they are both
O(n^2) solutions. That is computer-science-speak for the processing time
is proportional to the square of number of data elements. No big deal
for small values of n, but definitely a performance problem as the data
set gets big. The second one (with the following-sibling::datum in the
XPath predicates) takes about half the time of the first one, but it is
still proportional to n^2.

For large sets of data, it would be worth looking at an O(n log n)
solution, that is, one that is proportional to the number of elements
times the log (base 2) of the number of elements. This involves making a
sorted copy of the <datum> elements and then analyzing the results.
Sorting algorithms should perform in O(n log n) time, which is where
that value comes from.

If you are not familiar with the math, consider 1024*1024 = 1,048,576
versus 1024*(log(1024))=10,240. That's three orders of magnitude
difference.

Drawbacks to this solution include making a copy of the entire data set
and use of the vendor-specific node-set function (I've used the
Microsoft msxsl:node-set() function).

OTOH, this approach means we can sort the data once and gather various
statistics, whereas the other solutions are computing each statistic
(min and max) separately. So this approach is able to compute the median
(the value with as many values greater than it as less than it) with
almost no additional effort. I've included the mean (average) for
completeness, but you can see that it is no different than the ones in
the previous solutions, it does not utilize the sorted data and it
performs in linear time.


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    version="1.0"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" 
    exclude-result-prefixes="msxsl">

  <xsl:template match="data">

    <xsl:variable name="sorted-data">
      <xsl:apply-templates select="datum" mode="copy">
        <xsl:sort select="@value" data-type="number"/>
      </xsl:apply-templates>
    </xsl:variable>

    <min>
      <xsl:value-of
        select="msxsl:node-set($sorted-data)/datum[1]/@value"/>
    </min>
    <max>
      <xsl:value-of
        select="msxsl:node-set($sorted-data)/datum[last()]/@value"/>
    </max>
    <median>
      <xsl:value-of
        select="msxsl:node-set($sorted-data)/datum[ceiling(last() div
2)]/@value"/>
    </median>
    <mean>
      <xsl:value-of select="sum(datum/@value) div count(datum/@value)"/>
    </mean> 

  </xsl:template>

  <xsl:template match="datum" mode="copy">
    <xsl:copy-of select="."/>
  </xsl:template>
	
</xsl:stylesheet>


Enjoy!

Cheers,
Stuart



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]