This is the mail archive of the docbook@lists.oasis-open.org mailing list for the DocBook project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [docbook] Re: Question about prettyprinting Docbook documents and character entities


[apologies for responding to an old message]

Norman Walsh <ndw@nwalsh.com> writes:

> / Taro Ikai <tikai@ABINITIO.COM> was heard to say:
> | I am having a few problems prettyprinting my Docbook documents. I am using 
> | Cygwin distribution of Tidy.
> 
> Beyond the fact that tidy is not designed to pretty print anything but HTML,

With respect, I think that ain't necessarily so :)

Tidy has an -xml option for enabling you to tell it that you're working
with an XML file (i.e., a non-XHTML file authored against some arbitrary
DTD) instead of an HTML file. 

Beyond that, it even provides some customization capability for allowing
you to specify elements whose contents you don't want wrapped/indented/
pretty-printed (i.e., 'line specific' environments in DocBook lingo.)

I've made some limited use of Tidy myself in makefiles -- for the
purpose of pretty-printing some converted DocBook XML content -- and
found it works pretty well for the particular case in which I'm using it.

I could also see Tidy being useful as a means for "normalizing" XML
before checkins to a source-control system. 

What I mean is, many people work in environments where authors use
different editing applications to edit their XML content. Some of those
editors (e.g., Emacs/nxml or psgml) preserve the whitespace and line
breaks unless told to do otherwise, while others (e.g., Arbortext Epic)
cavalierly munge the original whitespace and formatting.

So, given that you've got no assurance that authors won't be using
editors that step all over the original white space/line breaks in your
XML source, in order to be able to get useful diffs out of your source-
control system, you need some way of 'normalizing' your source each time
you check in (or, you need some othe way of generating diffs that
doesn't rely on you source-control system's built-in diff capability,
but that's a whole 'nother issue...)

I think 'tidy -xml' can be useful with XML content in such cases.

That said, its handling of arbitrary XML files does have limitations --
notably, handling of general entities and stuff that Taro has mentioned
in previous messages.

> it is very, very hard to pretty print DocBook.

Be that as it may, I think Tidy, more than any other XML pretty-printing
solution, probably provides you with a way for controlling what should
be pretty-printed/indented and what shouldn't be. If there were a way to
work around the problems with arbitrary general entities and with some
of the other problems that Taro has mentioned, I think Tidy would be an
ideal solution.

  --Mike

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]