This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: html to xml
- To: <xsl-list at mulberrytech dot com>
- Subject: RE: html to xml
- From: "Lisa van Gelder" <lisa at wirestation dot co dot uk>
- Date: Fri, 27 Oct 2000 10:09:56 +0100
- Reply-To: xsl-list at mulberrytech dot com
> a) as we know, authors scatter <h1>, <h3> etc across their
>document
> like pointers. my target DTD needs structured divisions.
> b) HTML allows PCDATA practically anywhere, so far as I can
>see. so
> I get
> <h3>Hello</h3>
> I am the walrus
The basic problem is that the html you are getting is not structured enough
for your purposes.
I had the same problem, and solved it by setting rules for how the html
could be structured, so it could be converted into xml more easily. I do not
allow any text that is not surrounded by tags.
It depends what you are trying to do, and how much say you have over the
html that is created.
Lisa
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list