This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: RE: Saxon's handling of line breaks


Mike's global statement was correct:
> Line breaks in the input document and the stylesheet are
> automatically converted to a single NL character by the
> XML parser - that's defined by the XML standard.

The normative reference can be read at [1].

However, I think he spoke a little hastily in each of these two sentences:

> The XSLT specification doesn't give the
> processor license to
> do anything else.

No, the XSLT specification doesn't forbid outputting CR, LF, or CRLF; any of
these is fine, because they will all be interpreted the same by an XML
parser.

> If you want to output CRLF, you must do it
> explicitly, by
> writing <xsl:text>
> </xsl:text>.

No, that won't work of course, because the XML parser can't tell the
difference between CR, LF, or CRLF.

The upshot is that Saxon is right and Xalan is right, and XML doesn't give
you control over which character to output. That's up to the discretion of
the serializer (which might provide various mechanisms for
parameterization).

Evan

[1] http://www.w3.org/TR/REC-xml#sec-line-ends

> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Salvatore
> Mangano
> Sent: Monday, May 06, 2002 3:13 PM
> To: Michael Kay
> Subject: Re: RE: [xsl] Saxon's handling of line breaks
>
>
> If you look at the sample I provide I do indeed output
> <xsl:text>
> </xsl:text>. Yet the result is as if the CR is stripped.
>
> Also, I do not mention notepad because it is my prefered
> editor. I mention it only as a tool for diagnosing the problem
> simply *because* it doesn't do what many other editors
> automatically do.
>
> To make the problem plain as day please consider the following
> stylesheet:
>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>
> <xsl:output method="text"/>
>
> <xsl:template match="/">
> foo<xsl:text>
> </xsl:text>bar
> </xsl:template>
>
> </xsl:stylesheet>
>
> According to your explanation foo and bar should be seperated
> by whatever is enclosed in the xsl:text element. In this case
> it should be a CRLF combination because the stylesheet was
> created in an editor that writes out CR+LF at the end of line.
> However, after processing the stylesheet the CR was indeed
> stripped with saxon but not with xalan. Explain?
>
>
>
> > Line breaks in the input document and the stylesheet are
> automatically
> > converted to a single NL character by the XML parser - that's
> defined by the
> > XML standard.
> >
> > With the "text" output method, Saxon outputs the characters
> that it finds,
> > without change. The XSLT specification doesn't give the
> processor license to
> > do anything else. If you want to output CRLF, you must do it
> explicitly, by
> > writing <xsl:text>
> </xsl:text>. You could make this
> > platform-dependent by putting it in an external entity or
> supplying it as a
> > stylesheet parameter.
> >
> > I think most modern text editors will understand NL as a
> newline character
> > even on the Windows platform: perhaps it's time you moved off
> Notepad.
> >
> > Michael Kay
> > Software AG
> > home: Michael.H.Kay@ntlworld.com
> > work: Michael.Kay@softwareag.com
> >
> > > -----Original Message-----
> > > From: owner-xsl-list@lists.mulberrytech.com
> > > [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of
> Sal Mangano
> > > Sent: 06 May 2002 15:41
> > > To: xsl-list@lists.mulberrytech.com
> > > Subject: [xsl] Saxon's handling of line breaks
> > >
> > >
> > >
> > > Working with Saxon 6.5.1 on the Windows platform I noticed
> that line
> > > breaks literally represented as text elements are being
> output
> > > incorrectly for the Windows platform.
> > >
> > > For example,
> > >
> > > <xsl:stylesheet version="1.0"
> > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
> > >
> > > <xsl:output method="text" encoding="UTF-8"/>
> > >
> > > <xsl:strip-space elements="*"/>
> > >
> > > <xsl:template match="number">
> > >   <xsl:value-of select="."/><xsl:text>
> > > </xsl:text>
> > > </xsl:template>
> > >
> > > </xsl:stylesheet>
> > >
> > > When I capture the output produced by this stylesheet in a
> > > file and open
> > > in the Windows notepad editor it does not display correctly
> because
> > > notepad expects CR+NL pairs. Now if I open the stylesheet in
> > > notepad, it
> > > DOES display correctly which leads me to believe that the
> > > <text> element
> > > is actually enclosing a CR+NL pair. It seems that the
> either the
> > > stylesheet parser or the output serializer in saxon is
> > > stripping the CR.
> > > When I use the same stylesheet with xalan it works
> correctly.
> > >
> > > Is this a bug in saxon or a misunderstanding on my part?
> > >
> > > In general, how are stylesheets supposed to deal with line
> breaks in a
> > > portable fashion?
> > >
> > > Thanks,
> > >
> > > Sal
> > >
> > >
> > >
> > >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
> > >
> >
> >
> >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
> >
> >
> >
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]