This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
I have a question which I would be grateful if you, and any other programmers on this list familiar with Japanese text processing, could answer. As I wrote recently, I am leaning towards prescribing the use of Unicode in Guile, and backing up that prescription by providing support functions, encouraging authors of general-purpose modules to assume Unicode, etc. However, every Japanese programmer I have met is unenthusiastic about Unicode, which concerns me a great deal. Japanese is an important language to support, at least because I would like to take advantage of the general level of technical enthusiasm I have found over there. I have asked these programmers to explain their objections, but I don't feel I've understood the answers. Could you explain the situation to me? I did some investigation of my own to check for some obvious possibilities: JIS-X-0208 and JIS-X-0212 can both be mapped to Unicode with no collisions --- that is, there are no two characters, each chosen independently from either 0208 or 0212, which map to the same Unicode character. So you can convert text containing a mixture of 0208 and 0212 to Unicode, and back, with no loss of information. It is necessary to use lookup tables to translate between the JIS character sets and Unicode, but if structured correctly, those tables are small and efficient; tables for converting in both directions for JIS-X-0208 occupy only 60k.