This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Getting images into PDF



--xHbokkKX1kTiQeDC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 26, 2001 at 12:16:03AM +1200, Dave Brooks, BCS Systems wrote:
> I've been trying to get graphics into PDF from Docbook.

This is something I wrote a while ago.  It's two messages from me to
this list -- I don't have the time to clean these up at the moment, but
I figure getting the info out is probably of more use.

The FreeBSD project uses this scheme and generates output in HTML, PS,
TXT, and PDF, with image support, with no problems.

On Thu, Feb 08, 2001 at 05:28:38PM -0500, Dan York wrote:
> However, because we also would like to have a version available that can
> be easily printed, I have been trying to generate a PDF or PostScript
> file.  So far, I have been unsuccessful.  The two major problems are:
>=20
> 1. Graphics do not appear in the PDF file. They are implemented as
> <mediaobject> in the DocBook file.

I've been meaning to write this for a while.  This is my guide to
including images in DocBook as painlessly as possible.

  Assuming that you want to create, as a minimum, HTML, PS, and PDF
  documents with the best quality images, you need to do the following.

  First of all, you need to choose your preferred image format(s).  This
  is not as simple as simply picking a single format.  The difference
  between bitmap and vector based image styles means that one single
  format won't suffice.

  Instead, you need to pick a format that's good for bitmap images, and
  a format that's good for vector images.

  The rest of this document assumes PNG for bitmaps, and EPS for vector.

  There's another wrinkle.  In my experience, PDF generation works best
  if you pass pdftex the name of a .pdf file, and not a .eps file.
  However, EPS files can still be the source from which the PDF is
  generated.

  If you look at DocBook's image inclusion support, you'll see the
  <mediaobject> element, which can contain one or more <imageobject>s.
  The original idea was that, for each image, you would have one
  <mediaobject>, which would contain several different <imageobject>s,
  each pointing to a file in a different format.

  The stylesheets would then select which <imageobject> to use.  This is
  the approach taken in Norm Walsh's stylesheets.

  However, in my experience, this doesn't work quite right, and I had
  difficulty getting the stylesheets to always select the correct
  <imageobject> element to use.  A better approach has been to never
  include the filename's extension in the <imageobject> element's
  attributes, and let the stylesheets add the extension, or not, as
  necessary.

  A useful side effect of this is that you only ever write one
  <imageobject> per <mediaobject>.

  So, some sample markup might look like this:

  <mediaobject>
    <imageobject>
      <imagedata fileref=3D"image">  <!-- Filename, without extension -->
    </imageobject>

    <textobject>
      <phrase>An image</phrase>
    </textobject>
  </mediaobject>

  Of course, this assumes that you have image.{png,eps,pdf} in the
  current directory as well.

  If you want to convert a document containing this to HTML, you need to
  use a stylesheet that customise's Norm's sheets, and has

    (define %graphic-default-extension% "png")

  in it.

  HTML is easy in this respect.  PS and PDF are a little more
  complicated.

  Again, you need to use a customisation of Norm's stylesheets. =20
  the following two functions (these work at least up to v1.61 of Norm's
  sheets).  These are re-writes of Norm's functions.

-------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --------

  ; Norm's sheets try and work out which one of the <imageobject>s=20
  ; should be used.  However, we only ever have one, so just use
  ; the first one.
  ;
  ; XXX This can probably be made more efficient by dropping the let*
  ; clause.  One day, I'll get around to testing that.
  (define (find-displayable-object objlist notlist extlist)
    (let loop ((nl objlist))
      (if (node-list-empty? nl)
        (empty-node-list)
          (let* ((objdata  (node-list-filter-by-gi
                            (children (node-list-first nl))
                            (list (normalize "videodata")
                                  (normalize "audiodata")
                                  (normalize "imagedata"))))
                 (filename (data-filename objdata))
                 (extension (file-extension filename))
                 (notation (attribute-string (normalize "format") objdata)))
            (node-list-first nl)))))

  ; This function, given a graphic filename, looks at the filename's
  ; extension, and appends %graphic-default-extension% as necessary.
  ;
  ; However, given a bare filename (such as "image") TeX is perfectly
  ; capable of adding the .eps or the .pdf as necessary.  Rather than
  ; try and second guess TeX, don't do anything if the tex-backend=20
  ; variable is set.
  (define (graphic-file filename)
    (let ((ext (file-extension filename)))
      (if (or tex-backend   ;; TeX can work this out itself
              (not filename)
              (not %graphic-default-extension%)
              (member ext %graphic-extensions%))
           filename
           (string-append filename "." %graphic-default-extension%))))

-------- 8< -------- 8< -------- 8< -------- 8< -------- 8< --------

  You can see this in the FreeBSD customisation layer, at

    http://www.freebsd.org/cgi/cvsweb.cgi/doc/share/sgml/freebsd.dsl

  We keep both HTML and Print customisation layers in one file.  You can
  do the same thing, or use two different files if you want.

  OK, so suppose you have, in one directory, the following files:

     doc.sgml			Your document

     html.dsl			Your customisation layer for HTML docs,
     				which sets %graphic-default-extension%

     print.dsl			Your customisation layer for print docs,
     				which contains the two functions
				listed earlier.

     image.png			PNG image

     image.eps			EPS image

     image.pdf			PDF image

  what are the command lines you need to use?

  As I said, HTML is easy.

    jade -c /your/path/to/the/catalog/files 			\
         -d html.dsl						\
	 -t sgml						\
	 doc.sgml

  Add things like "-Vnochunks" or whatever, depending on your
  preference.  This should have used the PNG images.

  PS is also relatively simple.

    jade -c /your/path/to/the/catalog/files			\
         -d print.dsl						\
	 -Vtex-backend						\
	 -t tex							\
	 -o doc.tex						\
	 doc.sgml

  Notice that you have to give the "-Vtex-backend" option.  I've also
  shown the use of -o to explicitly set the output file name.

  You can then run

    tex "&jadetex" doc.tex

  a few times, to generate the .dvi file, and then convert the DVI file
  to PS.

  PDF is a little more complex.  As shipped, when producing PDF files,
  teTeX will prefer to include a .png file over a .pdf file.  I don't
  why that is. =20

  The way to work around this is to make sure that the line

    \catcode`@=3D11\def\Gin@extensions{.pdf,.png,.jpg,.mps,.tif}\catcode`@=
=3D12

  appears at the start of the .tex file, before you process it with
  pdftex.  There are many ways in which you can do this.

  Anyway, your command line for generating PDF should look like this;

    jade -c /your/path/to/the/catalog/files			\
      -d print.dsl						\
      -Vtex-backend						\
      -t tex							\
      -o doc.tex						\
      doc.sgml

  As you can see, this is the same command line as for generating PS
  output.

  Once you've run this to generate doc.tex, edit doc.tex, and insert the
  earlier "\catcode..." line at the start of the file.  Then you can run

    pdftex "&pdfjadetex" doc.tex

  a few times, to generate the .pdf file.

  That's that, pretty much.  You can see BSD style .mk files that
  implement all of this, at

    http://www.freebsd.org/cgi/cvsweb.cgi/doc/share/mk/

  and pay particular attention to doc.docbook.mk.  I'm not aware of any
  Linux distributions (with scripts like dbtopdf) that make this level
  of customisation possible.  Of course, it would be trivial for you to
  implement your own replacement scripts which do this.  The FreeBSD
  make(1) approach is, of course, there for the taking, and I'm happy to
  discuss it further on either this list, or doc@freebsd.org.

  <columbo>Oh, and one more thing.</columbo>

  As you might be aware, you can use w3m (a text mode browser, with
  support for tables) to provide very good DocBook -> Text, by first
  going DocBook -> HTML (as one big file), and then using w3m to convert
  the HTML to plain text.

  Wouldn't it be neat if you could include ASCII art in your document as
  well, such that when you were going to produce plain text, instead of
  getting the ALT text on the image, you got ASCII art instead?  Well,
  you can.

  First, suppose that your markup looks like this

  <mediaobject>
    <imageobject>
      <imagedata fileref=3D"image">  <!-- Filename, without extension -->
    </imageobject>

    <textobject>
      <para>+-----+
    |  A  |
    +-----+</para>
    </textobject>

    <textobject>
      <phrase>An image</phrase>
    </textobject>
  </mediaobject>

  (assume, for the moment, that your image is of a box with the letter A
  in it).=20

  The HTML stylesheets will search and make sure that the file

     image.%graphic-default-extension%

  exists.  If the image doesn't exist then the stylesheets will use the
  contents of the first <textobject> instead.

  So if you run something like

    jade -c /your/path/to/the/catalog/files                     \
         -d /path/to/nwalsh's/html/docbook.dsl                  \
         -t sgml                                                \
         -Vnochunks						\
         doc.sgml > doc.html

    w3m -T text/html -S -dump doc.html > doc.txt

  Then you will have doc.txt that contains your ASCII art instead.  This
  works because Norm's sheets, by default, do not define a value for
  %graphic-default-extension%.  In case he ever changes this, you might
  want to create another HTML stylesheet which explicitly sets the
  variable to #f.
 =20
  For an example of this in action, take a look at=20

    http://www.freebsd.org/cgi/cvsweb.cgi/doc/en_US.ISO_8859-1/article/vm-d=
esign

  and examine the files in there.

> 2. More importantly, outside of the missing graphics, the PDF file
> looks fine for the first 19 pages, until it gets to Chapter 4. In=20
> this chapter, I really just have the following construction:
>   <sect1>
>   <title></title>
>   <table>
>   ....
>   </table>
>   <table>
>   ....
>   </table>
>   </sect1>

I've downloaded your document, created a bunch of test images (I'm on a
56K dialup at the moment), and can't replicate this on FreeBSD, using
Jade 1.2.1, JadeTeX 2.2, and teTeX 1.0.7.

On Tue, Feb 20, 2001 at 04:04:27AM -0500, Adam Di Carlo wrote:
> I have an unreleased but I think pretty decent system, 'preheat'.
> This is a scheme-based system.  You specify a little scheme file with
> the local stuff to build and customization, for instance:

Oh God.  Just as soon as I think I've cleared some free time, something
new and interesting comes along.

:-)

We should really try and get together at some point and go through this
stuff in more detail.  I get the feeling there are several groups all
working towards similar goals, and there are just too many mailing lists
to keep track of.

> Anyhow, Nik, your system looks good -- good enough to go through and
> cull ideas from.

There's one change I had to make to the description I posted.  In
FreeBSD you now (or will have to very shortly, when I commit it) write
something like

    <imagedata fileref=3D"figure1" format=3D"PNG">

or '... format=3D"EPS"', depending on the *source* format of the image.
You still only have one ImageData element per image, but you have to
specify the format.

Why?

I discovered that if I convert PNG images to EPS files and then run them
through TeX they appear about twice the size they should do.  For
example, an 80x24 xterm takes up almost half the page.

Scaling the images works, but then you have a problem.

                                     Image Format

  Output Format           PNG                           EPS

  HTML              Native format, looks OK       Is converted to PNG
                                                  using png2eps, looks
						  OK

  Postscript        Is converted to EPS,=20
                    needs scaling by 50%          Native format, looks OK

  PDF               Native format, needs
                    scaling by 50%                Is converted to PDF,=20
		                                  looks OK

The PNG images are the problem.  You need to scale them in the
Postscript and PDF case.  You can do this in one of two ways:

1.  Write=20

      <imagedata fileref=3D"figure1" scale=3D"50">

    in your document.  The problem with this is HTML images will be
    scaled, and so will the EPS and PDF images.  So then you have to
    create all your EPS images twice the size they need to be, *and*=20
    scale them all by 50% when you convert EPS to PNG.

    Unacceptable.

2.  Update the stylesheet to scale all images by 50% if the scale
    attribute is not set.  Less work for the author, but has the other
    problems that (1) has.

Neither of these are acceptable.

So, my third solution was to mandate the use of the 'format' attribute.
But we use to specify what the original image format was.  So if you
have a PNG image, you write

    <imagedata filereg=3D"figure1" format=3D"PNG">

Then redefine the Graphic handling in the stylesheet, like so;

    (define ($graphic$ fileref
                       #!optional (display #f) (format #f)
                                  (scale #f)   (align #f))
      (let* ((graphic-format (if format format ""))
             (graphic-scale  (if scale
                                 (/  (string->number scale) 100)
                                 (if (and tex-backend
                                          (equal? graphic-format "PNG"))
                                      0.5 1)))
             (graphic-align  (cond ((equal? align (normalize "center"))
                                    'center)
                                   ((equal? align (normalize "right"))
                                    'end)
                                   (else
                                    'start))))
       (make external-graphic
          entity-system-id: (graphic-file fileref)
          notation-system-id: graphic-format
          scale: graphic-scale
          display?: display
          display-alignment: graphic-align)))

which automatically scales the image by 50% if the following holds

  1.  The author didn't specify a "scale" attribute themselves.

  2.  tex-backend is #t

  3.  The "format" attribute is "PNG"

A kludge, but it works.

> > I fetched the above mentioned FreeBSD Makefiles and adapted them=20
> > especially to Debian/GNU Linux (pathnames, etc.). Up to now there is no=
=20
> > additional documentation except the .mk files itself - and the original=
=20
> > posting of Nik of course!=20
> >=20
> > You will find a tarball at: http://www.miwie.org/docbkmake/
> >=20
> > Feedback is welcome :-)
>=20
> FYI, freebsd.dsl is already shipped on debian systems, in the
> docbook-stylesheets package.

Ah, I didn't know that.

What's the best way to make sure that you guys are informed when changes
are committed?

N
--=20
Internet connection, $19.95 a month.  Computer, $799.95.  Modem, $149.95.
Telephone line, $24.95 a month.  Software, free.  USENET transmission,
hundreds if not thousands of dollars.  Thinking before posting, priceless.
Somethings in life you can't buy.  For everything else, there's MasterCard.
  -- Graham Reed, in the Scary Devil Monastery

------------------------------------------------------------------
To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: docbook-apps-request@lists.oasis-open.org


--=20
FreeBSD: The Power to Serve             http://www.freebsd.org/
FreeBSD Documentation Project           http://www.freebsd.org/docproj/

          --- 15B8 3FFC DDB4 34B0 AA5F  94B7 93A8 0764 2C37 E375 ---

--xHbokkKX1kTiQeDC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjtguosACgkQk6gHZCw343WkiwCeKAIGkeuOY8hPgcxIjWhw1+wI
Ab8An3TJbME+Oue8zI6VDbBCnmD1bguR
=Gl63
-----END PGP SIGNATURE-----

--xHbokkKX1kTiQeDC--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]