This is the mail archive of the kawa@sources.redhat.com mailing list for the Kawa project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: A few tips

From: Per Bothner <per at bothner dot com>
To: Dominique Boucher <dboucher at nuecho dot com>
Cc: "'Kawa List'" <kawa at sources dot redhat dot com>
Date: Sun, 09 Nov 2003 20:47:14 -0800
Subject: Re: A few tips
References: <000501c39f55$a8ce8850$6400a8c0@Forman>

Dominique Boucher wrote:

I run Kawa servlets with Tomcat 4.1 on Linux, using Sun’s JDK 2
v1.4.1_01.

"Linux" means a lot of different things. My impression is that many distributions (including Red Hat for sure) are moving towards using UTF-8 as the standard/default encoding. If your files are instead ISO-8859-1, you'll probably have problems.

> Some of the configuration files contain

Scheme strings with diacritics (French accents, for instance).

To be pedantic, the files only "contain" diacritics if interpreted using the correct encoding. Specifically, the files are encoded in ISO-Latin-1, but your software environment thinks they use some other encoding, probably UTF-8.

This page http://fedora.redhat.com/docs/release-notes/ has some notes on encoding. Red Hat believes "In the long term, all systems are expected to migrate to UTF-8, eliminating this issue."

My guess is you might need to change your LANG environment variable, unless you're willing to migrate to UTF-8.

2. For the constant Scheme strings in the source code, you must make
sure that Kawa reads them properly when compiling. So add the mutation
to 'port-char-encoding' on the command-line:

shell> kawa -e '(set! port-char-encoding "ISO-8859-1")' -C
sourcefile.scm

The same presumably also applies for files loaded with the -f flag. And presumably for expressions type at the Kawa console. I don't think setting port-char-encoding will help with the latter, since the the standard input has already been opened. For that you need to set LANG.

3. Make sure the strings are not put in 'unescaped-data'.

I assume the "not" was unintended.

This way, all the special characters (those with French diacritics) will
be translated to their equivalent numerical entities (&#233; for é)
properly.

Whether é is written as é or as &#233 should depend on the encoding used for the underlying PrintWriter. Unfortunately, I don't know of any reliable way to get that. However, we can use port-char-encoding. We can also use OutputStreamWriter's getEncoding.

Does anyone know how one finds out in Java what the default (system) encoding is?

[Note: these problems may be due to a special configuration of the C
locale, but I can't modify it easily for Tomcat.

It's not a problem with the C local per se. However, it may be a problem that Tomcat is *using* the C locale - or a UTF-8 locale. I bet with the correct environment flags (LANG? LOCALE?) you could fix that. -- --Per Bothner per@bothner.com http://per.bothner.com/

Follow-Ups:
- Re: A few tips
  - From: Jim White

References:
- A few tips
  - From: Dominique Boucher

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]