This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: How to translate HTML to DocBook
- From: Patrick Hartling <patrick at vrac dot iastate dot edu>
- To: dbook at centrum dot cz
- Cc: docbook-apps at lists dot oasis-open dot org
- Date: Tue, 12 Mar 2002 17:00:08 -0600
- Subject: Re: DOCBOOK-APPS: How to translate HTML to DocBook
- References: <20020312210028Z684242-9966+600@mail.centrum.cz>
dbook@centrum.cz wrote:
> Hello,
>
> I need to translate many HTML sources to DocBook markups. Do you
> know some utility, which can provide it? I read something about
> DynaTag, but it seems that I must define here conversion rules.
> Do you know something "full-automatic"?
>
> Thank you.
>
> --------------------
> Žena v centru pozornosti na http://zena.centrum.cz
>
>
>
>
I have had some success with html2db
(http://freshmeat.net/projects/html2db/), though it is not perfect. The
main problem I had is that HTML is not as expressive as DocBook, so the
translation can only go so far. It also helps if the source is "good"
HTML. Having closing tags such as </li>, </p>, and </br> helps immensely.
Beyond any limitations of HTML, one thing to keep in mind is that html2db
uses DSSSL. Its output is thus based on the DocBook SGML DTD rather than
the XML DTD. sgmlnorm and sgml2xml come in handy if you want to go to XML.
-Patrick
--
Patrick L. Hartling | Research Assistant, VRAC
patrick@vrac.iastate.edu | 2624 Howe Hall -- (515)294-4916
http://www.137.org/patrick/ | http://www.vrac.iastate.edu/