This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: XSLT/XPath 2.0 (was "Identifying two tags...")


>But I have a real problem with the complexity of the subject matter
that 
>these standards describe. XSLT is evolving from a fairly
straightforward 
>little language, that can be described in a couple of pretty 
>straightforward specs, into a language that has to be described in 
>hundreds of pages full of terse and compact definitions. That's my
problem.

The concern with complexity in technology is quite valid. To continue
with the C++ comparison, there is fair criticism that it is more
difficult to learn object-oriented programming from C++ than it is from
Java, Smalltalk, etc., because C++ is more complex. For this reason,
most college OOP courses are taught using Java. Much has been written
about how to teach C++ effectively in order to emphasize the OOP since
that is the hardest part for most students to comprehend.

That said, I still am going to distinguish between the audience a
specification and the audience for the technology represented in that
specification. The former are implementers of the technology; the latter
are the users of the technology.

>But I don't think that XSLT has to be that hard to describe: I think 
>that the dependency on the complexities of XML Schema gives me precious

>little benefit, compared with the headaches it causes me.

I don't know about your applications of XSLT, but XML Schema is a very
important component in my XML world. Here's my cosmology. (BTW, I am
just a constant reader of the specifications, not a W3C member or
anything like that.) 

The base XML specification describes a data model, a serialization
specification, and the DTD. The XML data model is a tree of elements and
their attributes. The serialization specification says how data in the
data model is written to a file. The document type definition (DTD)
provides a coarse mechanism for writing metadata, that is data
describing the XML document data.

The data in an XML document can be accessed programmatically through
APIs like SAX and DOM. Those APIs are exposing the information set, or
infoset, represented by the data in the document with all artifacts of
the serialization removed.

XML Schema is a replacement for the DTD metadata. With XML Schema we can
associate programmatic type with content. The last three decades of
progress in software engineering can be characterized as the promotion
of type to a first-class programming concept -- programming by type lies
at the heart of object-oriented and component based programming. XML
Schema also provides precise control over the data model, such as
specifying the exact contents of an element or the range of occurrences
allowed.

All that is required of an XML parser is that it verify that the XML
document it is parsing is well-formed. It can disregard the DTD
entirely. That is called a non-validating parser. A validating parser
must read the document's metadata (DTD or Schema) and verify that the
XML document it is parsing is well-formed and valid according to the
metadata.

If I use the XML parser to validate an XML document against XML Schema
metadata, then the data exposed by SAX or DOM is the PSVI, the Post
Schema Validation Infoset. That says that the data in the XML document
is valid according to the metadata. In the PSVI, type has already been
associated with the data; all data in the document is valid against the
metadata.

When programming with XML, we can define programmatic types based on XML
Schema metadata. Then when we get data from the PSVI, we know that it is
valid against the metadata and we can use that data to construct an
object of that type, and do so without error. My data and all parts in
it conform to the requirements of the type.

Let's imagine that we want to write an XSLT processor. An XSLT processor
does not work with documents: it uses an XML parser to parse the
documents and present them through an API like DOM or SAX. The XSLT
processor only sees infosets. If we use a validating XML parser and the
documents have metadata in XML Schema, all the data that the XSLT
processor sees is PSVI. So it makes sense that the specifications should
be written in terms of the data provided to the application.

Returning to your observation, "the dependency on the complexities of
XML Schema gives me precious little benefit, compared with the
headaches...", I am trying to make the case that the benefit of the
specification making use of PSVI is that XSLT implementers can program
to that requirement and thus produce XSLT processors that are
interchangeable. To specify otherwise would let XSLT processor behavior
diverge, which would spell chaos. Count that as a big benefit to you.

XSLT 1.0 and XPath 1.0 became W3C Recommendations in November 1999. XML
Schema became a Recommendation in May 2001. That explains why XSLT 1.0
makes no reference to Schemas or the PSVI. But with XML Schema now in
place as a cornerstone of XML technology, it is important to make the
XSLT 2.0 and XPath 2.0 specifications consistent with the XML Schema.

Does that make XSLT harder to describe? Not really, unless you plan to
use the XSLT specifications as a textbook. If you think that XSLT should
not be hard to describe, why not write about it yourself? The next great
book on XSLT is waiting to be written.

Cheers,
Stuart


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]