This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Regular expression functions (Was: Re: comments on December F&O draft)
- From: Jeni Tennison <jeni at jenitennison dot com>
- To: "Marc Portier" <mpo at outerthought dot org>
- Cc: "Steven Noels" <stevenn at outerthought dot org>, xsl-list at lists dot mulberrytech dot com
- Date: Fri, 11 Jan 2002 10:44:09 +0000
- Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
- Organization: Jeni Tennison Consulting Ltd
- References: <LOEAIGHAMOLFGJLAJOFPMEJBCFAA.mpo@outerthought.org>
- Reply-to: xsl-list at lists dot mulberrytech dot com
Hi Marc,
> you mean: the *[index] is throwing all named subregexes on one array
> and getting the second regardless it's name, right?
Yes.
> getting an actual parenthesis group out of a named subregex would be
> different, no?
I don't think it has to be, if you use elements with some standard
name to represent them...
Say you had:
<regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
...
</matcher>
And you were matching the string:
"12.5 3.4E-2"
I was imagining that you'd get built a tree that looked like
(formatted for clarity - the only whitespace would actually be a
single space between the two fancy-number elements):
<fancy-number>
12
<rxp:match>.5</rxp:match>
</fancy-number>
<fancy-number>
3
<rxp:match>.4</rxp:match>
<rxp:match>E-2</rxp:match>
</fancy-number>
Where rxp is associated with some namespace like (for XPath anyway):
http://www.w3.org/2002/XPath/RegExp
So the values of the nodes selected by the following paths would be:
/ => ("12.5 3.4E-2")
/fancy-number => ("12.5", "3.4E-2")
/fancy-number[1] => ("12.5")
/fancy-number[1]/node() => ("12", ".5")
/fancy-number[1]/text() => ("12")
/fancy-number[1]/*[1] => (".5")
/fancy-number[1]/*[2] => ()
/fancy-number[2] => ("3.4E-2")
/fancy-number[2]/* => (".4", "E-2")
If you have named subexpressions within a named subexpression, that
just changes the name of the element created for that subexpression.
So if you had:
<regex name="mantissa">[0-9]+(\.[0-9]+)?</regex>
<regex name="exponent">[Ee][+-][0-9]+</regex>
<regex name="fancy-number">:mantissa::exponent:?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
...
</matcher>
Matching the same string would give you a tree like:
<fancy-number>
<mantissa>12<rxp:match>.5</rxp:match></mantissa>
</fancy-number>
<fancy-number>
<mantissa>3<rxp:match>.4</rxp:match></mantissa>
<exponent>E-2</exponent>
</fancy-number>
I should note that nothing existing in XPath or XSLT automatically
creates a tree in this way. However, several EXSLT functions do (as a
means of returning 'sequences', in fact!). I suspect that the
introduction of user-defined functions in XSLT will lead to more
functions that do this, but don't know whether people would feel it
was acceptable for a built-in function.
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list