This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Sorting and non-en_US indexes


Thanks again Jirka. 

Ok, then here's my understanding of the status of using autoidx.xsl with
non-English indexes (assuming Saxon). Please correct me if I've got
anything wrong.

To use autoidx.xsl for non-English languages (in addition to using the
classes for Saxon mentioned below), I have to modify autoidx.xsl in two
ways:

1) Supply upper and lower case letters of the alphabet which autoidx
uses to create indexdivs. For languages lacking the distinction between
upper and lower case, I just put the alphabet in both places so that
indexdivs are created. Any words beginning with a character not in the
alphabet provided here ends up in the symbol category.
2) Add an appropriate lang attribute to each xsl:sort in autoidx.xsl,
whether hard coded or gotten by looking at a @lang somewhere in the
input document so that Saxon will sort using the right Collator.

For languages with accented characters, my choices are: 
a) Add the accented characters to &uppercase; and &lowercase; and so
have words that begin with accented character end up in their own
indexdivs, or
b) Don't add these character to &uppercase; and &lowercase; and so have
words that begin with those characters end up in the Symbols indexdiv
c) Don't use words as indexterms if the first letter of the term has a
diacritical mark of some kind :) 

For Traditional Chinese, where I understand indexdivs are based on the
number of strokes rather than the initial character in the word,
autoidx.xsl doesn't support automatically generated indexdivs. To do
that, the stylesheet would have to be rewritten (and include the number
of strokes in an attribute on the <primary> element).

I understand that currently there is no way to have the stylesheets
store multiple alphabets for &uppercase; and &lowercase; and use the
appropriate one without the intervention of a processing system. I'm
thinking of something along the lines of storing the declarations for
uppercase and lowercase in files (en.ent, fr.ent), include parameter
entity declarations that point to these files, and a reference to one of
them, then have the processing system munge my customization of
autoidx.xsl so that it includes the correct entity reference before
using the xsl to process the document. The alternative to something like
that is to have a separate customization layer (with its own
autoidx.xsl) for each target language. 

Some of these things I'll understand better as we get further in our
experimentation, but it's helpful to know what behavior to expect since
it saves you from debugging something that's really working as designed
:) Once I've got this figured out, I'll write something up that we can
include somewhere in the docs or faq. 

Thanks,
David

-----Original Message-----
From: Jirka Kosek [mailto:jirka@kosek.cz]
Sent: Friday, September 20, 2002 1:24 PM
To: David Cramer
Cc: docbook-apps@lists.oasis-open.org
Subject: Re: DOCBOOK-APPS: Sorting and non-en_US indexes


David Cramer wrote:
> 
> Thanks Jirka. Just to make sure I understand how to use this: Once I
> compile one class for each target language following the naming
> convention Compare_<replaceable>language code</replaceable>
(Compare_ja,
> etc), I run saxon with the appropriate langague code?
> 
> > java -Duser.language=<replaceable>language-code</replaceable>
> com.icl.saxon.StyleSheet...
> 
> ...and it should use the Compare class for the user.language?

I never used user.language before, so I don't have idea for what is good
for. I used <xsl:sort ... lang="<replaceable>language
code</replaceable>"/>
 
> In the case of Japanese, where there's no notion of upper/lowercase, I
> shouldn't have to edit the declarations for &uppercase; and
&lowercase;,
> correct?

I think that you should, because each letter creates separate division
of index. As there are currently only English letters, Japanese index
terms will show all in symbol division.

				Jirka

-- 
-----------------------------------------------------------------
  Jirka Kosek  	                     
  e-mail: jirka@kosek.cz
  http://www.kosek.cz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]