This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Selective escaping of special characters


My apologies if this question has been asked before, I haven't found posts
that address this exact issue.

My problem is that I want to transform junk HTML generated by Microsoft
Word. This contains markup, of course, so my first instinct was to use
disable-output-escaping. However, this also disables escaping of other
special characters, like the special dash character –. These are then
outputted in a format my browser (Internet Explorer) doesn't understand (I
use "ISO-8859-1" as encoding in output).

I did work out a fix (pasted below) using a recursive named template, but
this is proving too slow for all but the smallest documents. (I use Saxon
6.5.1.)

My question is then: is there a fast way to only disable escaping for "<",
">" and "&"? Alternatively, can the named template below be optimized
significantly?

Thanks for any help.

Kyrre Wathne



<!-- Named template to output markup while escaping special characters -->

<xsl:template name="DUMP_TAG_STRING">
  <xsl:param name="str"/>
  <xsl:choose>
  <xsl:when test="not($str)">
    <!-- Empty String -->
  </xsl:when>
  <xsl:when test="not(contains($str, '&lt;')) and not(contains($str,
'&gt;')) and not(contains($str, '&amp;'))">
    <!-- My work is done -->
    <xsl:value-of select="$str"/>
  </xsl:when>
  <xsl:otherwise>
      <!-- Convert all XML markup characters temporarily to the backspace
character -->
      <xsl:variable name="escaped" select="translate($str, '&lt;&gt;&amp;',
'&#9224;&#9224;&#9224;')"/>
      <xsl:variable name="cutPos" select="1 +
string-length(substring-before($escaped, '&#9224;'))"/>
      <!-- All but last letter -->
      <xsl:variable name="before" select="substring($str, 1, $cutPos - 1)"/>
      <!-- Last letter -->
      <xsl:variable name="replace" select="substring($str, $cutPos, 1)"/>
      <!-- Find the string after before -->
      <xsl:variable name="after" select="substring($str, $cutPos + 1)"/>
        <!-- Dump part before match -->
        <xsl:value-of select="$before"/>
        <!-- Dump &lt; or &gt; as is, unescaped -->
        <xsl:value-of select="$replace" disable-output-escaping="yes"/>
        <xsl:if test="$after">
        <!-- Recurse with remainder -->
        <xsl:call-template name="DUMP_TAG_STRING">
          <xsl:with-param name="str" select="$after"/>
        </xsl:call-template>
        </xsl:if>
    </xsl:otherwise>
    </xsl:choose>
</xsl:template>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]