This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Ampersand for URLs


Thomas B. Passin wrote:
>  Just to put the last nail in this coffin, I hope, I tried it out.  I
> changed an "&" to a "%26" in the url query string in an anchor element.

*sigh* Why is this so hard for people to grasp?

URIs identify resources by name (URN) or location (URL). A URI is just a
sequence of ASCII characters, always. You can use the following
characters freely:

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
- _ . ! ~ * ' ( )

In addition, you can use the following charcters, but they are earmarked
as "reserved". Exactly what their reserved purpose is depends on the
scheme, i.e. URIs that begin with "http" use the HTTP scheme and the
meaning of the reserved characters is defined in the HTTP spec.

; / ? : @ & = + $ , 

You can also use %, but only to indicate the beginning of an escape
sequence of the form %XX where XX are 2 characters that form a
hexadecimal number from 00 to, theoretically, FF, although the meaning of
anything above 7F (the upper limit of ASCII) is questionable.

The reason you would want to use a %XX escape sequence is so you can
  - represent a reserved character being used for something other than
    its reserved purpose
  - represent the other ASCII characters that exist but that are disallowed
    in URIs, like %20 for the space character, %22 for the double quote, etc.

Therefore, "%26" and "&" in a URI are *not* equivalent, because "&" is a
reserved character... "%26" means just a "&" as if it were not a reserved
character. This is very similar to the concept of using "<" instead of
"<"  in XML where you want to say "<"-as-character-data-not-markup.

*pulls nail out of coffin*

In an HTTP URL that is being used to submit HTML form data, "&" in the
query part of the URL has the reserved purpose of being a separator
between each name-value pair for each form control. "&" is your only
option for writing this separator. "%26" is what you write if you want to
make the name or value in the form data contain an ampersand character.
Make sense? That's why your server gave you an error when you changed "&"
to "%26".

To illustrate,

http://foo/bar?field1=hello&field2=world
means that field1 has the value "hello"
and field2 has the value "world".

http://foo/bar?field1=hello%26field2=world
means that field1 has the value "hello&field2"
and there is a nameless field with value "world".

> Both IE5.5 and NS 4.73 on Windows accepted a plain '&' in the url string in
> the <a> element - they didn't require the &amp; .

They do this to retain compatibility with malformed HTML documents.

Since HTML doesn't have a concept of well-formedness, document authors
get away with using "&" to mean something other than the start of an
entity or character reference.

It does not follow that &amp; is not the correct way to do it, nor does
it follow that IE and NS won't accept it if you do it that way.


To further clarify, even though it has been stated in this thread several
times already,

http://foo/bar?field1=hello&field2=world is the URI.

If you want to write this in an XML or HTML document, you need to write
http://foo/bar?field1=hello&amp;field2=world
or
http://foo/bar?field1=hello&#38;field2=world

HTML browsers will accept
http://foo/bar?field1=hello&field2=world
even though they shouldn't.

The thing you have to keep in mind is that the document is interpreted
before it is acted on. This is true for XML and HTML. When 
<a href="http://foo/bar?field1=hello&amp;field2=world";> is in the HTML,
it is interpreted to mean

an "a" element
with an "href" attribute
  having value "http://foo/bar?field1=hello&field2=world";

and this is later used to construct an anchor that, when activated,
will take the user to http://foo/bar?field1=hello&field2=world.

Given that this is the same in XML/XSLT as it is for HTML, I don't see
why so many people are thrown by it. They assume that the href value is
copied verbatim and used as the literal URL that will appear in the 
Address/Location section of their browser.

*pound*

I wrote a bit about using non-ASCII characters in URIs at
http://skew.org/xml/misc/URI-i18n/

   - Mike
_____________________________________________________________________________
mike j. brown, software engineer at  |  xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA    |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]