This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Regular expression functions (Was: Re: comments on December F&O draft)


> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of
> Jeni Tennison
> Sent: woensdag 9 januari 2002 23:32
> To: xsl-list@lists.mulberrytech.com
> Subject: Re: Regular expression functions (Was: Re: [xsl] comments on
> December F&O draft)
>
>
> Hi Steven,
>
> Very interesting :)

thanks Jeni, we hope so :-)

> Could you explain a little more about how the matchers work? You call
> them by name - does each of them search over the entire string, or do
> later matchers only match on what's left after matching the earlier
> ones? Did you try any other designs? What made you choose this one?

the principal matcher included with the root <element> is matched
against the entire input document

depending on its outcome, nodes (atts and elems) are generated, or
additional matchers are called: implicitely on the entire matched region
(for-each like), or explicitely using regex groups (comparable to the
tokenization of requests in Cocoon):each "parenthesized" pattern region
can be addressed individually using an integer

this way, you can define which matcher has to be applied to which region

> > One of the things which doesn't work well currently is the
> > specification of the regex as an attribute to the <matcher> element.
> > We will avoid this by putting the regex inside a CDATA section of a
> > <regex> subelement (will be optional, we are testing this right
> > now). Not sure whether this is good practice, advice welcome. It is
> > only partially related to this discussion of course.
>
> I can see why you'd want to do that, given that you're matching HTML
> tags. Note that you're doing more escaping than you have to in the
> attribute value, though. Consider:

yes, I got lazy after a while and started to escape everything ;-)

> delimited the attribute with single-quotes. So you could have:
>
> <matcher
> regex='CLASS="story3">([^&lt;]+)&lt;BR>&lt;/SPAN>&lt;/FONT>&lt
> ;/STRONG>
> &lt;FONT\sCOLOR="#333333"\sFACE="sans-serif,\sarial">&lt;SPAN\
> sCLASS="s
> tory">([^&lt;]+)&amp;nbsp;(.+)&lt;A\sHREF="([^"]+)">More'
> name="items">

I find this mixture even less readible somehow :-) but on the ' and ",
you are absolutely correct - it was just my XML IDE that uses double
quotes by default

> But I agree - if you've got regular expressions like this, it's best
> to put them in an element where you can use CDATA sections to at least
> make it look like the stuff you're matching.

and that is what we will do - a pity one cannot declare an attribute of
being CDATA type in the sense of CDATA sections on the document content
level

> For XSLT, I think that attributes are more natural because attributes
> are used for this kind of thing elsewhere (matching nodes, for

indeed, and exactly the reason why we started off with atts for our
regexes

> instance). It would be handy if the regular expressions could be held
> in (global) variables because then they could be defined in content
> (with CDATA sections) rather than in an attribute. However, that would
> run up against the dynamic regular expression problem that David and I
> talked about yesterday. I don't think it'll be too big a problem,
> though - the regular expressions in XSLT are likely to be a lot
> smaller than these, and not include tags (hopefully!).

I will try to read and understand your discussion - because we already
thought of storing the regexes in such a way but threw that idea away
because it was affecting the readability of the regexslt
transformationsheet

I like all parameters to a certain action to be contained in the same
area, and storing the regexes inside 'global variables' would conflict
with that

thanks for your reaction,

Steven Noels
http://outerthought.org/
(+32)478 292900



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]