This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Normalizing spaces?


Norman asked, in reply to my question:

At 12:53 PM 5/8/01 -0400, you wrote:
/ "M. Wroth" <mark@astrid.upland.ca.us> was heard to say:
| Hmmm.  The behavior I asked about was the "normalization" (possibly
| not the right word) of spaces  in element content (specifically,
| although not limited to, <para> elements), in the SGML version of
| DocBook processed with the Modular DSSSL Style Sheets.

The odd thing about your question is the example. SGML systems discard
whitespace from element content (the content of chapter, book,
procedure, etc.), but not from mixed content. A paragraph is mixed
content, so DSSSL shouldn't be throwing away spaces.

| I observe that multiple whitespace characters (newline, tab, and
| space) show up in the output file as a single space character.  This
| behavior is not universally present (i.e. in other DTD and other style
| sheets, such concatenation is not observed).  I'm   trying to
| understand how this works.

Can you identify some specific content models in other DTDs where the
effect is different. And what other tools?

Here is an example of content:


         <p>It is not permissible under the Society's rules to
         fimbriate a chief. Laurel precedent (Laurel Alison, Dec 86
         and Aug 88) <q>however this is blazoned, in appearance it
         includes a fimbriated chief, which is not permitted for
         Society usage</q>. RFS VIII.3 limits fimbiration to simple
         geometric charges placed in the center of the field; while a
         chief is a simple geometric charge, it is not in the center
         of the field. </p>


Note that the indentation is achieved by space characters in the input file (put there by PSGML/emacs).

The DTD content model is

<!element p      o o (#PCDATA
                      | blazon | q | bk
                      | bq | sa | cite)* +(cite)>
<!ATTLIST P INCLUDEIN (MIN|LOI|BOTH|IGNORE) BOTH>

and I'm processing it with a homegrown DSSSL style sheet run through Jade

C:\USR\DSSSL\JADE\JADE.EXE:I: Jade version "1.2.1"
C:\USR\DSSSL\JADE\JADE.EXE:I: SP version "1.3.3"

but the behavior is identical with

C:\USR\DSSSL\OPENJA~1.3\BIN\OPENJADE.EXE:I: OpenJade version "1.3"
C:\USR\DSSSL\OPENJA~1.3\BIN\OPENJADE.EXE:I: OpenSP version "1.3.4"

The processing rule for a <p> is long and complicated, but doesn't explicitly do anything  to get rid of spaces; ultimately the content is processed with process-children.


(element p (make sequence
   (if (have-ancestor? "MINUTES")
     (make sequence
       (case (attribute-string "INCLUDEIN")
         (("MIN" )
                   (if (first-sibling?)
                       (make paragraph (process-children))
                       (make paragraph  ;The "else" clause
                          first-line-start-indent: 12pt
                          (process-children)
                       )
                   )  )
         (("LOI" )(empty-sosofo))
         (("BOTH" )
                    (if (first-sibling?)
                        (make paragraph (process-children))
                        (make paragraph  ;The "else" clause
                           first-line-start-indent: 12pt
                           (process-children)
                        )
                    )  )
         (("IGNORE" )(empty-sosofo))
         ((#f)
               (if (first-sibling?)
                   (make paragraph (process-children))
                   (make paragraph  ;The "else" clause
                      first-line-start-indent: 12pt
                      (process-children)
                   )
               )  )
       )
     ); end of the ``THEN'' clause
     (make sequence
      (case (attribute-string "INCLUDEIN")
         (("MIN" )(empty-sosofo))
         (("LOI" ) 
                    (if (first-sibling?)
                        (make paragraph (process-children))
                        (make paragraph  ;The "else" clause
                           first-line-start-indent: 12pt
                           (process-children)
                        )
                    ) )
         (("BOTH" )
                    (if (first-sibling?)
                        (make paragraph (process-children))
                        (make paragraph  ;The "else" clause
                           first-line-start-indent: 12pt
                           (process-children)
                        )
                    ) )
         (("IGNORE" )(empty-sosofo))
         ((#f)
               (if (first-sibling?)
                   (make paragraph (process-children))
                   (make paragraph  ;The "else" clause
                      first-line-start-indent: 12pt
                      (process-children)
                   )
               )  )
      )
     )
   )
 )
)

(I wouldn't object to suggestions for better ways to do this -- but it's old code and works fine, so I'm not particularly looking for ways to improve it, other than this question; it will likely get replaced when the underlying DTD gets changed in the not too distant future).

Mark B. Wroth
<mark@astrid.upland.ca.us>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]