This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Spelling checking templates (Was: RE: Re: attribute closest match)


--- "Matthew L. Avizinis" <mla at gleim dot com> wrote:

> I'm happy knowing that there are widely varying differences of
> opinion  on this matter.
> So, Dimitre, how precise is precise?  If I were to define closest
> match to be words that contain all of the letters with one 
> transposition, e.g. hte for the, or the spelling is correct except 
> for one letter, e.g. mofe for mode, would that, iyo, be precise 
> enough?
> Of course a spelling checker might prevent many of these kinds of
> errors in data entry, but it would still be, I believe, an 
> interesting exercise to be able to catch these kinds of errors if 
>data was entered without a spellchecker abvailable (this would be 
> another type of error I would include later because it seems like it
> would be more difficult to check for).
> Any more help, suggestions, (or even code)?
> thanks,
> > >
> > >    Matthew L. Avizinis <mailto:mla@gleim.com>
> > > Gleim Publications, Inc.
> > >    4201 NW 95th Blvd.
> > >  Gainesville, FL 32606
> > > (352)-375-0772
> > >       www.gleim.com <http://www.gleim.com>
> >
> >
> > Can be done in XSLT, but first you have to define precisely
> > "_closest match_".


Hi Mathew,

This is quite straightforward to do using FXSL. Please, find bellow the
code that solves your particular problem, and that may be used as part
of a spelling checker, implemented in XSLT.

Should I mention, that I'm using FXSL here? :o)

Suppose you have the following source xml:

words2.xml:
----------
<elements>
  <element cana="1"/>
  <element cna="2"/>
  <element an="3"/>
  <element con="4"/>
  <element cbb="5"/>
</elements>

The result of applying the transformation presented bellow will be:

<elements>
  <element can="1" />
  <element can="2" />
  <element can="3" />
  <element can="4" />
  <element />
</elements>

As you can see, deletion, replacement and adding of a single character,
as well as transposing two adjacent characters is corrected. Two
replacements are not handled.

You may play with any other combinations of attribute names.

Here's the transformation:

spelling.xsl:
------------
<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:vendor="urn:schemas-microsoft-com:xslt"
 xmlns:delLetter="f:delLetter"
 xmlns:addLetter="f:addLetter"
 xmlns:addLetterSingle="f:addLetterSingle"
 xmlns:repLetter="f:repLetter"
 xmlns:repLetterSingle="f:repLetterSingle"
 xmlns:transPair="f:transPair"
 exclude-result-prefixes="vendor delLetter addLetter 
 repLetter transPair repLetterSingle addLetterSingle"
 >
 
 <xsl:import href="str-foldl.xsl"/>
 
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 
  <delLetter:delLetter/>
  <addLetter:addLetter/>
  <repLetter:repLetter/>
  <transPair:transPair/>
  <repLetterSingle:repLetterSingle/>
  <addLetterSingle:addLetterSingle/>
  
  <xsl:variable name="validChars"
select="'abcdefghijklmnopqrstuvwxyz'"/>
  
  <xsl:template match="/">
    <xsl:variable name="vrtfCloseWords">
      <xsl:call-template name="closeWords">
        <xsl:with-param name="pWord" select="'can'"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:variable name="vCloseWords" 
                  select="vendor:node-set($vrtfCloseWords)/*"/>
    <elements>
      <xsl:for-each select="/elements/element">
        <xsl:copy>
          <xsl:for-each select="@*[name()=$vCloseWords]">
            <xsl:attribute name="can">
              <xsl:value-of select="."/>
            </xsl:attribute>
          </xsl:for-each>
        </xsl:copy>
      </xsl:for-each>
    </elements>
  </xsl:template>
  
  <xsl:template name="closeWords">
    <xsl:param name="pWord"/>
    
    <xsl:call-template name="delLetterWords">
      <xsl:with-param name="pWord" select="$pWord"/>
    </xsl:call-template>
    
    <xsl:call-template name="repLetterWords">
      <xsl:with-param name="pWord" select="$pWord"/>
    </xsl:call-template>
    

    <xsl:call-template name="addLetterWords">
      <xsl:with-param name="pWord" select="$pWord"/>
    </xsl:call-template>
    
    <xsl:call-template name="transPairWords">
      <xsl:with-param name="pWord" select="$pWord"/>
    </xsl:call-template>

  </xsl:template>
  
  <xsl:template name="transPairWords">
    <xsl:param name="pWord"/>
    
    <xsl:variable name="vftransPair" 
                  select="document('')/*/transPair:*[1]"/>
    <xsl:variable name="vrtf-accum">
      <accum>
        <position>1</position>
        <word><xsl:value-of select="$pWord"/></word>
        <closewords></closewords>
      </accum>  
    </xsl:variable>
    
    <xsl:variable name="vaccum" 
                  select="vendor:node-set($vrtf-accum)/*"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vftransPair"/>
        <xsl:with-param name="pA0" select="$vaccum"/>
        <xsl:with-param name="pStr" select="$pWord"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:copy-of select="vendor:node-set($vrtfResults)/closewords/*"/>
  </xsl:template>
  
    <xsl:template match="transPair:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
    
      <xsl:variable name="vPos" select="$arg1/position"/>
      <xsl:variable name="vWord" select="$arg1/word"/>
      <xsl:variable name="vCloseWords" select="$arg1/closewords"/>
      
      <xsl:variable name="vNewWord" 
       select="concat(substring($vWord, 1, $vPos - 1),
                      substring($vWord, $vPos + 1, 1),
                      $arg2,
                      substring($vWord, $vPos + 2)
                      )"/>
        <position><xsl:value-of select="$vPos + 1"/></position>
        <word><xsl:value-of select="$vWord"/></word>
        <closewords>
          <xsl:copy-of select="$vCloseWords/*"/>
          <word><xsl:value-of select="$vNewWord"/></word>
        </closewords>
  </xsl:template>

  <xsl:template name="delLetterWords">
    <xsl:param name="pWord"/>
    
    <xsl:variable name="vfDelLetter" 
                  select="document('')/*/delLetter:*[1]"/>
    <xsl:variable name="vrtf-accum">
      <accum>
        <position>1</position>
        <word><xsl:value-of select="$pWord"/></word>
        <closewords></closewords>
      </accum>  
    </xsl:variable>
    
    <xsl:variable name="vaccum" 
                  select="vendor:node-set($vrtf-accum)/*"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vfDelLetter"/>
        <xsl:with-param name="pA0" select="$vaccum"/>
        <xsl:with-param name="pStr" select="$pWord"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:copy-of select="vendor:node-set($vrtfResults)/closewords/*"/>
  </xsl:template>
  
  <xsl:template name="repLetterWords">
    <xsl:param name="pWord"/>
    
    <xsl:variable name="vfRepLetter" 
                  select="document('')/*/repLetter:*[1]"/>
    <xsl:variable name="vrtf-accum">
      <accum>
        <position>1</position>
        <word><xsl:value-of select="$pWord"/></word>
        <closewords></closewords>
      </accum>  
    </xsl:variable>
    
    <xsl:variable name="vaccum" 
         select="vendor:node-set($vrtf-accum)/*"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vfRepLetter"/>
        <xsl:with-param name="pA0" select="$vaccum"/>
        <xsl:with-param name="pStr" select="$pWord"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:copy-of select="vendor:node-set($vrtfResults)/closewords/*"/>
  
  </xsl:template>
  
  <xsl:template name="addLetterWords">
    <xsl:param name="pWord"/>
    
    <xsl:variable name="vfaddLetter" 
                  select="document('')/*/addLetter:*[1]"/>
    <xsl:variable name="vrtf-accum">
      <accum>
        <position>1</position>
        <word><xsl:value-of select="concat($pWord, ' ')"/></word>
        <closewords></closewords>
      </accum>  
    </xsl:variable>
    
    <xsl:variable name="vaccum" 
                  select="vendor:node-set($vrtf-accum)/*"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vfaddLetter"/>
        <xsl:with-param name="pA0" select="$vaccum"/>
        <xsl:with-param name="pStr" select="concat($pWord, ' ')"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:copy-of select="vendor:node-set($vrtfResults)/closewords/*"/>
  
  </xsl:template>
  
  <xsl:template match="delLetter:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
    
      <xsl:variable name="vPos" select="$arg1/position"/>
      <xsl:variable name="vWord" select="$arg1/word"/>
      <xsl:variable name="vCloseWords" select="$arg1/closewords"/>
      
      <xsl:variable name="vNewWord" 
       select="concat(substring($vWord, 1, $vPos - 1),
                      substring($vWord, $vPos + 1)
                      )"/>
        <position><xsl:value-of select="$vPos + 1"/></position>
        <word><xsl:value-of select="$vWord"/></word>
        <closewords>
          <xsl:copy-of select="$vCloseWords/*"/>
          <word><xsl:value-of select="$vNewWord"/></word>
        </closewords>
  </xsl:template>
  
  <xsl:template match="addLetter:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
    
    <xsl:variable name="vPos" select="$arg1/position"/>
    <xsl:variable name="vfaddLetter" 
                  select="document('')/*/addLetterSingle:*[1]"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vfaddLetter"/>
        <xsl:with-param name="pA0" select="$arg1"/>
        <xsl:with-param name="pStr" select="$validChars"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:variable name="vResults" 
                  select="vendor:node-set($vrtfResults)/*"/>
      
      <position><xsl:value-of select="$vPos + 1"/></position>
      <xsl:copy-of select="$vResults[not(self::position)]"/>     
  </xsl:template>
  
    <xsl:template match="addLetterSingle:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
  
    <xsl:variable name="vPos" select="$arg1/position"/>
    <xsl:variable name="vWord" select="$arg1/word"/>
    <xsl:variable name="vCloseWords" select="$arg1/closewords"/>
      
      <xsl:variable name="vNewWord" 
       select="concat(substring($vWord, 1, $vPos - 1),
                      $arg2,
                      substring($vWord, $vPos)
                      )"/>
        <position><xsl:value-of select="$vPos"/></position>
        <word><xsl:value-of select="normalize-space($vWord)"/></word>
        <closewords>
          <xsl:copy-of select="$vCloseWords/*"/>
          <word><xsl:value-of select="$vNewWord"/></word>
        </closewords>
  </xsl:template>

  
  <xsl:template match="repLetter:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
    
    <xsl:variable name="vPos" select="$arg1/position"/>
    <xsl:variable name="vfrepLetter" 
                  select="document('')/*/repLetterSingle:*[1]"/>
    
    <xsl:variable name="vrtfResults">
      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vfrepLetter"/>
        <xsl:with-param name="pA0" select="$arg1"/>
        <xsl:with-param name="pStr" 
             select="translate($validChars, $arg2, '')"/>
      </xsl:call-template>
    </xsl:variable>
    
    <xsl:variable name="vResults" 
                  select="vendor:node-set($vrtfResults)/*"/>
      
      <position><xsl:value-of select="$vPos + 1"/></position>
      <xsl:copy-of select="$vResults[not(self::position)]"/>     
  </xsl:template>
  
  <xsl:template match="repLetterSingle:*">
    <xsl:param name="arg1" select="/.."/> <!-- A0 -->
    <xsl:param name="arg2"/>
  
    <xsl:variable name="vPos" select="$arg1/position"/>
    <xsl:variable name="vWord" select="$arg1/word"/>
    <xsl:variable name="vCloseWords" select="$arg1/closewords"/>
      
      <xsl:variable name="vNewWord" 
       select="concat(substring($vWord, 1, $vPos - 1),
                      $arg2,
                      substring($vWord, $vPos + 1)
                      )"/>
        <position><xsl:value-of select="$vPos"/></position>
        <word><xsl:value-of select="$vWord"/></word>
        <closewords>
          <xsl:copy-of select="$vCloseWords/*"/>
          <word><xsl:value-of select="$vNewWord"/></word>
        </closewords>
  </xsl:template>
</xsl:stylesheet>


Hope this helped.



=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]