This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: One texdocument in and several xmldocuments out?



Looking at the previous replies, there seems to be a little confusion, but I
have done something very similar so I hope this helps.

1) I have found the key to creating xml files from MS Word is to use Word's
'styles' feature to format headings, sub-headings, etc., and page/section
breaks to separate out the individual section.  For example, each headline
in your user guide is formatted in a style called 'headline' and preceded by
a page break.

2) I then use a tool called UpCast from www.infinity-loop.de - the basic
version is free - to convert the Word file to XML.  There are various
options - I usually try to remove any visual formatting information, because
I can add that back in later.  The output file then looks something like
this:
- Each section of the Word document is enclosed by a <part> element.
- Each paragraph is enclosed by a <par> element, with the original Word
style retained as an attribute.

Something like:
<document>
<part>
  <par kind="headline">Your headline</par>
  ... ... ...
</part>
<part> 
  <par kind="headline">Your headline 2</par>
  ... ... ...
</part>
</document>

3) It's not quite that simple, but once you've got this far, it is fairly
straightforward to create multiple xml files using an xsl stylesheet.
Essentially, you need a template that selects each part, ie.,
"document/part" in a for-each statement, and using one of the xslt processor
extensions - I have used <xsl:document> that works with Saxon 6.5, output
each part to a new xml file, or html file as appropriate.  You can pick out
the 'headline' style to create a variable, which you can then use to create
a filename for the output file.

You can create an index file from within the same xsl stylesheet, too.
Again, use the same variable that you use to create the filenames for the
output files, you can create a link to that file; the template you use to do
this can then be output to a new file, too.  

Exactly how you do this will depend on what you want to do with your output.

I hope this helps!

Steve Perriman
Internet/Intranet Technician
South Trafford College, UK





-----Original Message-----
From: Tove Nilstun [mailto:tove.nilstun@exallon.sigma.se]
Sent: 06 May 2002 12:29
To: 'XSL-List@lists.mulberrytech.com'
Subject: [xsl] One texdocument in and several xmldocuments out?


Hi

I am a total beginner when it comes to XML, but in order to start working
with it, there are two things I need to sort out.

I have a user guide (written in MS Word) with both text and pictures. I
would like to 1. convert this document to several xml documents, one per
headline and 2. create an additional xml file containing an index of the
files created in step one.

Is this possible?
Tove

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


----------------------------------------------------------------------------
---------
This message is sent in confidence for the addressee only. It may
contain confidential or sensitive information. The contents are not to
be disclosed to anyone other than the addressee. Unauthorised
recipients are requested to preserve this confidentiality and to advise us
of any errors in transmission. Thank you
----------------------------------------------------------------------------
---------

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]