This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: One texdocument in and several xmldocuments out?
- From: "Stuart Celarier" <stuart at ferncrk dot com>
- To: <xsl-list at lists dot mulberrytech dot com>
- Date: Mon, 6 May 2002 09:33:20 -0700
- Subject: RE: [xsl] One texdocument in and several xmldocuments out?
- Reply-to: xsl-list at lists dot mulberrytech dot com
You can convert a Word document to HTML using File / Save As... and
selecting HTML or Filtered HTML. The difference between these two is
HTML preserves all of Word's information such as <span> tags to mark
spelling and grammar issues, whereas Filtered HTML drops the
Word-specific tags. Then follow the advice already provided here (e.g.,
Tidy) to ensure that the HTML is well-formed XML.
Cheers,
Stuart
-----Original Message-----
From: owner-xsl-list@lists.mulberrytech.com
[mailto:owner-xsl-list@lists.mulberrytech.com] On Behalf Of Robert
Koberg
Sent: Monday, May 06, 2002 06:43
To: xsl-list@lists.mulberrytech.com
Subject: Re: [xsl] One texdocument in and several xmldocuments out?
Hi,
Zack Brown wrote:
>On Mon, May 06, 2002 at 01:28:51PM +0200, Tove Nilstun wrote:
>
>
>>Hi
>>
>>I am a total beginner when it comes to XML, but in order to start
working
>>with it, there are two things I need to sort out.
>>
>>I have a user guide (written in MS Word) with both text and pictures.
I
>>would like to 1. convert this document to several xml documents, one
per
>>headline and 2. create an additional xml file containing an index of
the
>>files created in step one.
>>
>>Is this possible?
>>
>>
>
>Absolutely. Just create one XSLT file for each output file you desire.
>Then run the XML through your parser once for each XSLT file you've
>created
>
You do not need an XSLT file for each page.
First you have to get the MSWord doc into XML. THere are a few products
out there that convert Word to docbook or some other XML. A neat trick
we found when building our MSIE-based editor was that you could paste a
MSWord doc into an element that has contentEditable="true". IE converts
this to HTML. We use JS to convert it to XML on the client, but you
could use Tidy to get well-formed HTML (XML). Then hopefully there are
clean separations to indicate where a new page should start.
Apply-templates (loop) on each page division and (you can) use extension
functions built into Saxon or Xalan to create multiple output documents
from one source.
best,
-Rob
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list