This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Re: Any Doc to XML converter ?
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] Re: Any Doc to XML converter ?
- From: Peter Flynn <peter at silmaril dot ie>
- Date: Wed, 20 Jun 2001 23:51:22 +0100
- Organization: Silmaril Consultants
- References: <20010619165125.30010.qmail@web14508.mail.yahoo.com>
- Reply-To: xsl-list at lists dot mulberrytech dot com
On Tue, 19 Jun 2001, Dmitri wrote:
> Bob DuCharme wrote:
>
> > In his latest 'XML Deviant' column in XML.com
> > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds describes
> > and points to a recent thread on the topic.
>
> >From a recent MSDN article 'Export a Word Document to XML' by Kevin McDowell
> (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm)
>
> 'The XML output by this application is very straightforward and very similar to the
> HTML output by Word itself, but it fully accounts for all styled text, tables, and
> lists. '
Which may very well be true, but the output is largely garbage.
This whole discussion misses the major points:
1) Iff your Word document is formatted 100% exclusively with
named styles, robust conversion to meaningful XML is easily
possible with a number of packages, eg Enigma's DynaTag.
2) If your Word document uses arbitrary manual styling, no
amount of footling around with conversions is going to
produce anything other than an XML-syntax'd representation
of all the styles. You still have to undertake the hardest
part, which is interpreting all the styling cruft into some
meaningful markup. XSLT could certainly be used at this
stage.
This assumes you do want meaningful markup. If all you need is
the XML representation of the manual styling, then there are
several solutions already discussed.
It may be instructive that a someone last year wrote a short VB
script to turn any DOC file into XML, extracting all the style
info into a CSS stylesheet in a single pass...and it was written
on a laptop in the bus on the way to the airport after a
meeting. I'm sure it has long been superseded but this is not
rocket science.
///Peter
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list