This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: html to xml


> a) as we know, authors scatter <h1>, <h3> etc across their
>document
> like pointers. my target DTD needs structured divisions.

> b) HTML allows PCDATA practically anywhere, so far as I can
>see. so
> I get
>   <h3>Hello</h3>
>   I am the walrus

The basic problem is that the html you are getting is not structured enough
for your purposes.

I had the same problem, and solved it by setting rules for how the html
could be structured, so it could be converted into xml more easily. I do not
allow any text that is not surrounded by tags.

It depends what you are trying to do, and how much say you have over the
html that is created.

Lisa


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]