This is the mail archive of the docbook@lists.oasis-open.org mailing list for the DocBook project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: converting to docbook


Hello DocBook-heads,
The following possibility may have already been explored on this list without me knowing since I've only been on it a few months, but I'll throw it out there anyway. If not, I'd appreciate your scrutiny...

At 9:37 AM -0700 8/13/02, Bob Stayton wrote:
On Tue, Aug 13, 2002 at 02:26:24AM -0700, jonathon wrote:
 >
	I have roughly 10 000 documents of various formats
	[ plain ASCII, TeeX, DocBook, HTML 4.01, XHTML 1.0
	word, wordperfect, pdf and a couple of others. ]

	Can anybody point me to something that will easilly convert
	these to docbook, and preserve some/most of their current
 >	formatting?

[snip]

For your PDF documents, I'd look for the source document
that generated the PDF.  It is tough (impossible?)
to convert PDF.
There is this beta "SaveAsXML" plug-in for Adobe Acrobat 5 that has customizable mapping tables. If you produce a _tagged_ PDF (e.g. product of InDesign, MS Office PDFmaker (win only)), SaveAsXML will convert it to a generic XML doc mapping block styles to XML tags. If you use DocBook tag names for your blocks, then you will get a primitive form of DocBook, or what I'm calling preDocBook. I customized the supplied XML table to pick up Bold and convert it to <emphasis> and Italic to <citetitle>. I've attached this table if anyone want to try it out.

The main advantage I can see in this strategy is that you bypass converting HTML tags to DocBook and all the decisions that are associated. (e.g. Is H1 a <sect1> or <chapter>?). Also, you can predesignate your blocks in the authoring application.

Feedback?
Marc Brierley

link to SaveAsXML plug-in:
http://www.adobe.com/support/downloads/89a6.htm
--


**************************************
Stanford Academic Computing Publications
Meyer Library 260
560 Escondido Mall
Stanford University
Stanford, CA 94305-3093
(650) 725-6883 voice
(650) 725-8495 fax

Attachment: XML-1-00preDocBook.xml
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]