This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Second try: Search and replace many strings that may not be present in target


Hi folks,

I thought I'd try sending this out again before using one of my molasses
solutions.

----- Forwarded message from Zack Brown <zbrown@tumblerings.org> -----

To: xsl-list@lists.mulberrytech.com
Reply-To: xsl-list@lists.mulberrytech.com
Subject: [xsl] Search and replace many strings that may not be present in target
From: Zack Brown <zbrown@tumblerings.org>
Date: Fri, 17 May 2002 18:43:02 -0700

Hi folks,

I'm trying to reproduce a feature using XSLT that I had working when I used my
deeply broken home-grown XML parser. I'm moving to 'xsltproc', and GNU Make,
which has so far shown itself equal to all challenges (thanks to some help ;-).

Situation:

I have a number of files that each contain a root element <kc>, with a number
of <section> elements. Within each <section> element there may be a number
of <quote who="firstname lastname">text</quote> elements.

Several instances of the raw text "firstname lastname" may also appear
in the raw text of each <section> tag. A "firstname lastname" text is only
significant to this feature if it has also appeared identically in a <quote>'s
"who" attribute in at least one of the files under consideration.

Problem:

Here is the feature: for each <section> tag in each file, I would like
to do a search and replace on the first occurrence of each "firstname
lastname" appearing in raw text.

Example:

Assume that the <quote>'s "who" attributes in the various files have named
"Tom Jones", "Terry Haywood", and "Isaac Asimov". And assume the
following <section> tag in one of the files:

------sample input------
<section>

<p>this is a section containing a name, George Eliot, that has not
appeared in a &lt;quote&gt; tag. Therefore it will not be acted on by
this feature.</p>

<p>This paragraph contains a &lt;quote&gt; tag naming Isaac Asimov,
thus: <quote who="Isaac Asimov">And here he is saying something. Hi
Mom!</quote></p>

<p>this paragraph contains a reference to Terry Haywood, who appears in
a &lt;quote&gt; tag in a different file. Here is another reference to
Isaac Asimov, but it should not be matched, because only the first
occurrence of a given name in a section should be matched.</p>

</section>
------------------------

In the above sample, only Isaac Asimov and Terry Haywood should be
identified. Tom Jones does not appear in the sample, so the search-and-replace
will not find him. Also, George Eliot appears in the sample, but is not in
the list of names that have appeared in <quote> tags in one of the files,
so she will also not be found by the search and replace. Assuming that the
search and replace will insert a link to another page corresponding to the
name, then the output from the sample input would look like this:

---- sample output -----
<section>

<p>this is a section containing a name, George Eliot, that has not
appeared in a &lt;quote&gt; tag. Therefore it will not be acted on by
this feature.</p>

<p>This paragraph contains a &lt;quote&gt; tag naming Isaac Asimov [<a
href="people/Isaac_Asimov.html">*</a>],
thus: <quote who="Isaac Asimov">And here he is saying something. Hi
Mom!</quote></p>

<p>this paragraph contains a reference to Terry Haywood [<a
href="people/Terry_Haywood.html">*</a>], who appears in a &lt;quote&gt;
tag in a different file. Here is another reference to Isaac Asimov, but it
should not be matched, because only the first occurrence of a given name in
a section should be matched.</p>

</section>
------------------------

Partial solution:

The assumption I've been making is that I will do a first pass through
all files to create metafiles, containing lists of all names appearing
in <quote> tags in all files. Then these files will be concatenated into
a single XML file.

I will then do a second pass, in which I process all files for HTML output. The
XSLT will also use document() to read in the large file just created. That
will theoretically give it all the data it needs to do the search and replace.

At that point my ideas break down. I can think of some very slow
solutions, but nothing that would be feasible for a situation in which
there are hundreds of files and thousands of names and a pentium III
processor.

Thanks a lot for any help.

Zack

-- 
Zack Brown

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


----- End forwarded message -----

-- 
Zack Brown

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]