This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Japanese and Unicode



I have a question which I would be grateful if you, and any other
programmers on this list familiar with Japanese text processing, could
answer.

As I wrote recently, I am leaning towards prescribing the use of
Unicode in Guile, and backing up that prescription by providing
support functions, encouraging authors of general-purpose modules to
assume Unicode, etc.

However, every Japanese programmer I have met is unenthusiastic about
Unicode, which concerns me a great deal.  Japanese is an important
language to support, at least because I would like to take advantage
of the general level of technical enthusiasm I have found over there.

I have asked these programmers to explain their objections, but I
don't feel I've understood the answers.  Could you explain the
situation to me?

I did some investigation of my own to check for some obvious
possibilities:

JIS-X-0208 and JIS-X-0212 can both be mapped to Unicode with no
collisions --- that is, there are no two characters, each chosen
independently from either 0208 or 0212, which map to the same Unicode
character.  So you can convert text containing a mixture of 0208 and
0212 to Unicode, and back, with no loss of information.

It is necessary to use lookup tables to translate between the JIS
character sets and Unicode, but if structured correctly, those tables
are small and efficient; tables for converting in both directions for
JIS-X-0208 occupy only 60k.