This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Re: Any Doc to XML converter ?
- To: <xsl-list at lists dot mulberrytech dot com>, <peter at silmaril dot ie>
- Subject: RE: [xsl] Re: Any Doc to XML converter ?
- From: "Joshua Allen" <joshuaa at microsoft dot com>
- Date: Wed, 20 Jun 2001 17:41:07 -0700
- Reply-To: xsl-list at lists dot mulberrytech dot com
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm
produces very clean XML for me; in what sense is it "mostly garbage"?
You're not thinking of the "save as HTML" or whatever that is built-in,
are you? You can flip on all sorts of extra options with this tool that
add more extra "garbage", but using the simple options faithfully
represents the structure and does a good job with scenario #1 that you
listed below.
> -----Original Message-----
> From: Peter Flynn [mailto:peter@silmaril.ie]
> Sent: Wednesday, June 20, 2001 3:51 PM
> To: xsl-list@lists.mulberrytech.com
> Subject: Re: [xsl] Re: Any Doc to XML converter ?
>
> On Tue, 19 Jun 2001, Dmitri wrote:
> > Bob DuCharme wrote:
> >
> > > In his latest 'XML Deviant' column in XML.com
> > > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds
> describes
> > > and points to a recent thread on the topic.
> >
> > >From a recent MSDN article 'Export a Word Document to XML' by Kevin
> McDowell
> > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm)
> >
> > 'The XML output by this application is very straightforward and very
> similar to the
> > HTML output by Word itself, but it fully accounts for all styled
text,
> tables, and
> > lists. '
>
> Which may very well be true, but the output is largely garbage.
> This whole discussion misses the major points:
>
> 1) Iff your Word document is formatted 100% exclusively with
> named styles, robust conversion to meaningful XML is easily
> possible with a number of packages, eg Enigma's DynaTag.
>
> 2) If your Word document uses arbitrary manual styling, no
> amount of footling around with conversions is going to
> produce anything other than an XML-syntax'd representation
> of all the styles. You still have to undertake the hardest
> part, which is interpreting all the styling cruft into some
> meaningful markup. XSLT could certainly be used at this
> stage.
>
> This assumes you do want meaningful markup. If all you need is
> the XML representation of the manual styling, then there are
> several solutions already discussed.
>
> It may be instructive that a someone last year wrote a short VB
> script to turn any DOC file into XML, extracting all the style
> info into a CSS stylesheet in a single pass...and it was written
> on a laptop in the bus on the way to the airport after a
> meeting. I'm sure it has long been superseded but this is not
> rocket science.
>
> ///Peter
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list