This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: Future XSLT expansion. ( Re: Microsoft XSL and Conformance )


Hi Paul,

Paul said:
Actualy, this example has exactly the same disadvantage that your first
example has.

Let us just spend a few seconds thinking what realy happens.

We have a client and server here.

In your scenario:

Client ( executing XSLT stylesheet ) makes a request to the server
for some data. Client receives some heap of data from server.
Client then is using ony *part* of  that heap ( filtering
the heap with some criteria ). But the entire heap has been
transfered to the client. Overhead.

In my scenario:

Client makes a request to the server and *server* filters
the heap. That means client gets only the data it needs.

Didier replies:
Yes, to get the entire data is not an efficient usage of the bandwidth. But,
talking of bandwidth, using XML, a so verbose meta language, is probably not
the most cost effective way to do this too. So it is of using Java to
develop software, after all Java is not as resource efficient as, let's say
C++ or C. The main reason why we use XML is: inter-operability, ease of
development, etc... The main reason we use Java is not to provide better
stuff to the user but to reduce our costs: portability costs, development
costs (no memory leak checks, etc...). So yes, I agree, this is not very
bandwidth efficient, but, anyway, we didn't move with markup technologies
and Java toward efficient usage of resources doesn't it?

An advantage of having the document function used client side is that the
server has less process to do, and thus can scale more. In certain ways, is
more efficient from the resource management point of view. The client does
the job to aggregate the content with an XPath expression containing a
document() function. Also, the main problem is that not all services will
support precise XML queries, therefore, the client will have to do some
process to extract the right info. For instance, a query to an indexing
engine may return a set of rdf records instead of a single field. This, for
several reason, one of them being that the URL request is less complicated.
To say that you require a set of records about a particular topic and that
you want to exclude some information may lead to a big and complex URL. Do
not forget that the whole universe or all web services API won't be all SQL
servers. Thus, the client will still need a way to extract the right
information if it receives information like this:

<article id="334657">
<url>http://www.zdnet.com/art334645.htm</url>
<description>this is a funny headline<description>
<date> march 12 2000 23:43 GMT</date>
<source> Ziff Davis Network</source>
</article>

If your app is interested only with the URL and if the server returns the
whole record (and you cannot change that, this is the way, the server works)
then the client can extract the right information which is, in this case the
URL.

Let's analyze further and see that anyway, doing server side processing or
client side processing is about the same thing in terms of bandwidth costs
but not the same thing in terms of scalability and server resources usage.

Scenario 1: The processing is done server side. The user agent does a
request, the server (using any kind of method) is asking to an external
server, through a URL request, to get an XML fragment. The server has then
to extract the right info (let's say the url from the sample above)
aggregate it with the other content and send it to the client in a format
the client understand. Conclusion: we have an network request anyway and all
the processing is done server side. Thus, you need more power on your server
and may experiment scalability problems.

Scenario 2: The processing is done client side. The XSLT style sheet gets
from a URL request made to an external server, an XML document fragment. The
XSLT style sheet process the file client side. Conclusion: the server has
nearly nothing to do except transmit files. An XML document is traveling the
net as in A) but the destination is the client instead of the server and the
processing is done ont the client side, not on the server side.

So, in both case we have the same document fragment that occupy the same
bandwidth share. Except that in a) we impose more work on the server and
that in b) more work on the client. What is desirable? In a) you have a
client often as powerful as the server sitting idle waiting for an answer
from an overburden server. In b) the client's processor cycles are at least
used for something and the server is now able to respond to more clients
without having to add more resources.

So yes Paul, there is a difference. Include in your analysis the following
factors:

a) a web service request is not necessarily done to a server inside the
intranet boundaries. The request may be targeting a remote server located on
the internet (that may use VPN to assure some privacy).
b) not all web services API will be sql. You'll have in fact, a plethora of
APIs and types of URL queries. Even some APIs will be based on more complex
structures and may require an XML document fragment as input (or something
else).
c) The client is often under used and when the server does all the job, the
server becomes the bottleneck. The more simultaneous client that you have,
the more you overtax the server and have a big bunch of client doing nothing
but wait for the server to do the job.
d) finally distributed computing does not mean that the server does all the
job but that processing load is as much as possible shared between the
client and the server.

Paul said:
The same collision takes place in PXSLServlet.

PXSLServlet is creating XML from SQL. Then XSLT Stylesheet
preforms rendering of that XML. When I need some data
sorted / filtered I'm using ORDER BY part of SQL query, even
I could sort on the level of XSLT. Yes I could. But if the 'source'
could make some preprocessing ( ordering, filtering ) - why should
XSLT part bother with sorting and filtering ?

Didier replies:
The server you are talking about is within the intranet boundaries. What if
you have to get data from a server located at the other end of the world? I
know what you're gonna say. That the server can make SQL request using the
Java SPI (service provider interface) to request the data set. Fine. Will
this go through the firewall? we know the answer. fine. What if the request
is not an sql request as in the example I mentioned in the previous post?

Paul said:
The last by not the least.

The functionality you are asking for:

document('http://www.moreover.com/parameters-here')/url

Equals to:

$var = document('http://www.moreover.com/parameters-here');
select $var/url

This could be already done with XSLT without polluting the semantics
of document() function.

Didier replies:
What? you tell me that using the document function is polluting the
semantics of the document function??? come on Paul, take a walk, read again
the specs or do something. I simply use the document function as it is
stated in the specs. The document function takes as parameter a URL and
returns a node list. bottom line. The specs does not restrict to usage of
the document function to certain URLs. Moreover, the document function is a
valid construct to be included in an XPath step because it is part of
XPath!!! doesn't it? Please, take a look again at the XPath specs. An XPath
expression such as document(..)/elementx/elementy is a valid XPath
expression. doesn't it? And saying that

$var = document('http://www.moreover.com/parameters-here');
select $var/url

is better than
document('http://www.moreover.com/parameters-here')/url

is a matter of personal taste and programming style. I wouldn't say that the
former is better than the latter or vise versa. They are simply equivalent.
Paul, take a good walk, thing about it and find a better argument this time.
Moreover, the construct

$var = document('http://www.moreover.com/parameters-here');
select $var/url

is not a valid XSLT construct at all Paul. So, please, have more rigueur in
your demonstration at least. If one of my graduate students where as sloppy
in their demonstration when I was teaching They probably wouldn't have
graduated. Please don't mix and match things. A bit of rigueur please.

> Paul said:
> > Yes indeed, the document function is where the magic potion is hidden
>
> I don't think it is a good idea to turn XSLT into yet another monster
> with things like that.
>
> I think XSLT's extension elements could ( should ) do things like that.
>
> ( I actualy think that it could be better to remove some stuff from XSLT
to
> achive better balancing between core XSLT and extensions, but that's
another
> story.).
>
> Isn't it  strange that fundamental node:set 'typecast'  is not in a core,
> but some other things are in there ?
>
> Didier replies:
> So your opinion is that the document function part of the XSLT 1.0
> recommendation is a monster?

Paul said:
I think :

document() is simply a logical hack  ( like most other solutions
from XML world, the hack is nice and handy, of course ).

What it *realy* is:

a. "get data"
b. "convert data to nodeset".

(a) could be text file, identified by  URL, but it could also be any other
source.

Actualy, XSLT transformation couild be already invoked over DOM. In this
case what if I have *2* DOMs ?  Do I understand right that having
multiple input files is 'legal', but having multiple DOM's is 'marginal' ?

Didier replies:
Yes XSLT transformation can be invoked over the DOM. So far so good. What do
you mean by "in this case if we have two DOMs" and how is this related to
the actual demonstration? Sorry, I do not understand where you are going.
And yes you are right having multiple input file is "legal". We cannot have
two DOMs, the DOM is only the interface, Probably you mean here two document
trees - if that is the case, no, it is not marginal to have more than one
document tree - but there is only one output tree (at least in accordance to
the recommendations - you may have more than one output tree with
proprietary extensions). You should give first the context on how the
multiple DOMs are created. But using the document() function _may_ result in
the construction of a DOM tree, this latter to be included in the already
present DOM tree (the actual XML document being processed).

Paul said:
(b) is not in XSLT standart.  *that's* a problem.

Didier replies:
Oh yea? thanks Paul I didn't knew :-)))

Here is an extract from the W3C recommendations:
This section describes XSLT-specific additions to the core XPath function
library. Some of these additional functions also make use of information
specified by top-level elements in the stylesheet; this section also
describes these elements.

12.1 Multiple Source Documents
Function: node-set document(object, node-set?)

The document function allows access to XML documents other than the main
source document.
[....] and so on and so forth...

It is an XPath function. More precisely, an XPath expression which is XSLT
specific. According to the XPath and XSLT recommendations, it is valid to
use an XPath document() function as a step in an XPath expression. Tsss,
tsss, here is your homework Paul. Think and answer to these questions:

a) When an XPath engine process an XPath expression what the engine is
doing?
b) How the output of processing a particular step is fed into the input of
the next step? Do you know any analogy of this process in the Unix world?
c) What are the XPath composition rules?

Didier said:
> this is an opinion and I would respect it even
> if I do not share it. Please Paul, think more about some concept that are
> appearing on the web now:
>
> a) the notion of web service: you do a URL request and get back an answer
in
> XML. Do you find that a monster?

Paul said:
No. When saying  'monster' I mean that instead of elegant and expandable
core layer + some utility functions built on top of the layer, we are
receiving
'handy hacks' in the core layer.

Didier replies:
So, you find the actual XPath and XSLT recommendations to contain some
monstrous parts and get "handy hacks" in the core layer. I cannot comment on
this. This is a matter of taste and taste is undisputable.

Paul said:
<monster_example>

If one get a chance to dig into internals of Tcl/Tk and Perl, there would be
no question why Tcl now supports threads, but Perl still has some
problems with threads. I'm not talking about the languages. I'm talking
about the implementation and some design principles.

Not even talking about perl 4,  Perl 5 had *a lot* of things hardcoded
in the core, instead of placing those things into extensions. For example,
one could use read / write  send / recv  and sysread / sywrite, print  e
t.c.
 to work with sockets and *all* those functions were  ( are? ) in the *core*
perl executable ( doing more or less the same thing). Such a mess.

On the opposite, the Tcl/Tk balancing between layers was *amazing*.
When I first realized that Tcl/Tk ( version 7?) has:

1. Very small OS-specific C kernel for drawing windows / processing mouse/
keyboard.
2. On top of (1), some other C layer.
3. On top of (2) - Tcl layer.
4. On top of  (3) - the rest ( and that 'the rest' is in Tcl! The behavior
of buttons has
been coded in *Tcl*, not in C ).

I was realy impressed to see how could layering be done *realy*
elegant, when things that should be in a core are in a core and
things that should be 'extensions' - are extensions.

UNIX way  == pipes of components.

</monster_example>

Perl is monster. Tcl / Tk was not. ( Even I don't like Tcl language
itself ).

Didier replies:
Good point.

Didier said:
> Is CORBA or DCOM better? if yes why?

Paul replies:
Ghm... we should also mention RMI, I think ....

I would like not to start comparing one monster to another.

<aside>
I would give CORBA ( and especialy EBJ ) a chance. With some
ongoing marginal implementations of EBJ, there are some nice
simplifications happening.  EJBoss looks very interesting.
</aside>

In principle - I have nothing against URI's and URL's  ( especialy - I have
nothing against webDav. I think that could be the next amazing thing )

Didier replies:
so far so good.

Paul said:
I just said that  XSLT's 'document()' function better not to  be polluted in
the way  you are suggesting ;-) But if it will be - I'l not cry also. I
could
live without or without that 'document' hack. I'm not using 'dcument' with
XSLT  at all. With PXSLServlet I'm bringing everything I need into one and
only
one and always one XML document - and then XSLT does it's job == rendering.

Didier replies:
Let's reset the clock to the same time here. Why are you saying that I am
polluting the specs when I am simply using a totally valid XPath expression.
Furthermore, using it within the boundaries of the XSLT/XPath 1.0
recommendations. Please Paul, take some time, read again both the XPath and
XSLT recommendations, buy one or two good book on the subject (I can
recommend a good one but I would be in a certain conflict of interests
situation :-)) And thing again about what you just said.

Didier said:
> b) the notion of content aggregation. If a posted document is an XML
> document, a fragment of this document can be aggregated by an other
document
> using an XPath expression (i.e using the document function as an XPath
> step).

Paul said:
XSLT is about transformations of single document ( XML tree ), but not
about content management.

Didier replies:
Who said that? you? Is aggregating document fragment doing content
management or simply using content? we may not have the same definition of
the word management. But please check if you have the same one as the others
;-)

Paul said:
document()  function is a logical hack . Content management
( aggregation, addressing, storage, updating, versioning  e t.c. et c. )
is another problem domain. Trying to turn XSLT into silver bullet for
*anything* is very understandable ( because to me - XSLT is in *much*
better shape than some other things ), but such an attempt  already
caused some absolutely useless features, bloating the engine.

Didier replies:
Saying that the document function is a logical hack is again a matter of
taste or grounded in a comparison against better ways to do things. I won't
comment on that. Saying that content management is an other domain is right
and I do not contradict that. And finally I won't comment on the last
opinion about the silver bullet since I am a technologies agnostic and do
not favor a technique over the other. I would simply say that there is a
wave of investments and tools and I have to adapt to this wave and these
tools as good or as bad they may are. XML is not the most efficient language
but at least we made tremendous progress by all agreeing on it. Imagine if
all the people on earth would have simply agreed on a single alphabet. Even
if the languages are different, we have at least agreed on a same alphabet.
XSLT may be weird to learn at first, but at least it does a good job to
transform a set of documents into different rendition languages. This is
very useful when you have to provide some information or build an
application for Cell phones and classical browsers.

Paul said:
When it comes to merging multiple XML documents into one
XML document on the fly - stylesheet becomes a mess.

Didier replies:
I won't comment on this.

Paul said:
I think it is obvious, that  having 'eval' and 'node-set' in the core
is *much* closer to original XSLT purpose than  getting fancy
( 'non-stradard' ) way of navigating multiple XML documents
( and their fragments ).

Didier replies:
Paul, I am a patient man but please do some homework before talking with
such assurance. Please...

Paul said:
That's the issue of taste, of course.


Didier replies:
I cannot agree more on that.

Paul said:
With current XSLT extensibility features, almost everything could
be done with extensions.

There is actualy no need in belowed 'for' loop in XSLT,
because the same functionality could be easily implemented
with extension:range + extension:node-set

That means if one needs to grab some part of the
separate document - this also could be done with

extension:give-me-part-of-the-document-or-some-data-from-database +
extension:node-set.

Didier replies:
True but not in a portable and standard way, doesn't it?

Didier said:
> Off course in the case of b) you can say that there are some serious
> commercial problems.

Paul said:
No. I'm talking only about the design. About balancing functionality
between 'core' and 'layers'. XSLT is good in that balancing, but
not perfect. I think there are some useless things in there, but 'eval'
and 'node-set' are missing in the core ;-) For example.

Actualy, that is all understandable. The idea was to make something
for 'documents'. Unfortunately, the world is not a heap of plain XML files.
Not at all, actualy.

Didier replies:
So on one hand you say that XSLT is well balanced and on the other hand you
say that the document() function is a hack. Humm, I am getting a bit
confused here Paul.

Didier said:
> Is this what you are saying or, once again, the monster of
> email misinterpretation played some of its tricks. Note: Guys from
Scotland
> assured me that the monster of email misinterpretation has no parenthood
> relationship with the loch ness monster - the former is the bad guy and
the
> latter the good guy.

Paul said:
Monsters are just funny. The good software is elegant. grep and yacc
will never die, I think. Perl certainly will die, like it was with PL 1.

The funny thing is that we like XSLT because of XPath part.
XPath part is the only part of XSLT free of 'XML-mania'.
XPath is good-old-UNIX-alike-command-line-grep-alike-beast.

Small beast which requires you to type those */[] things. Verrrrrry
bad for 'end-user'.

Didier replies:
But you are precisely arguing against a particular XPath construct. Again, I
am confused.

Paul said:
PS. Indeed, I would like to type in some belowed csh:

ls  / / some

Instead of

find .  -name some

Didier replies:
Again, a matter of taste and won't comment on that.

Conclusion:
Please Paul, do me a favor, read again the XPath and XSLT specs before
replying.

Cheers
Didier PH Martin
----------------------------------------------
Email: martind@netfolder.com
Conferences: Web Chicago(http://www.mfweb.com)
             XML Europe (http://www.gca.org)
Book: XML Professional (http://www.wrox.com)
column: Style Matters (http://www.xml.com)
Products: http://www.netfolder.com


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]