This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: EUC-JP and the Yen sign
- To: martin at loewis dot home dot cs dot tu-berlin dot de
- Subject: Re: EUC-JP and the Yen sign
- From: GOTO Masanori <gotom at debian dot or dot jp>
- Date: Mon, 16 Oct 2000 08:34:12 +0900
- Cc: gotom at debian dot or dot jp, drepper at cygnus dot com, eggert at twinsun dot com, haible at ilog dot fr, libc-alpha at sources dot redhat dot com
- References: <200010152230.AAA10716@loewis.home.cs.tu-berlin.de>
From: "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
Date: Mon, 16 Oct 2000 00:30:13 +0200
> > Open Group in Japan published advisory documentations:
> > http://www.opengroup.or.jp/jvc/cde/appendix.html
> > it also said that 0x5C is yen sign.
>
> For those of us not fluent in Japanese, can you please explain the
> tables in http://www.opengroup.or.jp/jvc/cde/ucs-conv.html#ch3_1_2?
> There is one table saying that eucJP-open character 0x5C relates to
> U+00A5, and another saying that it relates to U+005C.
>
> Also, can you explain the relevance of the eucJP-*open* character name
> designation? The OpenGroup registry
> (ftp://ftp.opengroup.org/pub/code_set_registry/cs_registry1.2h)
> only knows of eucJP:1993; it comments on that character set
>
> Comments
> Implementation of the EUC (Extended UNIX Codes) encoding
> method, with ISO 646:1991 IRV assigned to CS0, JIS X0208:1990
> assigned to CS1, JIS X0201:1976 assigned to CS2, and
> JIS X0212:1990 assigned to CS3.
> end
>
> which, to me, says that the IRV is used for 05/12 (i.e. reverse
> solidus).
See,
http://www.opengroup.or.jp/jvc/cde/sjis-euc-e.html
"Detailed naming of code set".
Open Group named the standard eucJP as "eucJP-open",
because even between cooporations does not have same character
map joined in Open Group.
(See, http://www.opengroup.or.jp/jvc/cde/euc-e.html)
But, all Unices I know is not supported such a "eucJP-open" locale
or charset. It presents only the name of Open Group's eucJP by definition.
> Furthermore,
> http://www.y-adagio.com/public/standards/tr_xml_jpf/kaisetsu.htm lists
> a number of eucJP variants; it appears that x-eucjp-unicode-0.9,
> x-eucjp-jisx0221-1995, x-eucjp-open-19970715-ms all map character 5C
> to U+005C, whereas x-eucjp-open-19970715-0201 is listed as mapping it
> to U+00A5.
OK. I tranlate from Japanese to English in section 3.1:
3.1 The range from 0x20 to 0x7E ([US-ASCII] or [JIS X 0201])
x-eucjp-unicode-0.9, x-eucjp-jisx0221-1995, x-eucjp-open-19970715-ms,
x-eucjp-open-19970715-ascii defines that the range form 0x20 to 0x7E
are translated as [US-ASCII], followed by Japanese EUC.
The only exception is x-eucjp-open-19970715-0201 which is defined
below translation rules followed by [JIS X 0201].
Table 3.1 x-eucjp-open-19970715-0201
Code Value in EUC Translation To
0x5C(REVERSE SOLIDUS) U+00A5(YEN SIGN)
0x7E(TILDE) U+203E(OVERLINE)
Return to Open Group document,
http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html
section 3.1.2 (b) and (c).
3.1.2 Code Set Conversion Rules
(snip)
b.Of the conversion specified in JIS X 0221, the conversion rules when
it is used in conjunction with JIS X 0201.
In this case, the conversion of yen sign and backslash are performed
as follows.
eucJP-open UCS
0x5C YEN SIGN (0x00A5)
(snip)
c.Of the conversion specified by JIS X 0221, conversion rules when it is
used in conjunction with ASCII.
In this case, the conversion of yen sign and backslash are performed
as follows.
eucJP-open UCS
0x5C REVERSE SOLIDUS (0x005C)
x-eucjp-open-19970715-"0201" directs 3.1.2 (b).
However, the G0 of EUC-JP directs ASCII, not JIS X 0201.
So, 3.1.2 (c) is appropriate to use as conversion rules.
Read http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html
section 3.1.1 (4) The Yen Sign problem.
You may see why this problem is occured.
> It may be clear to you; to me, it is not.
Ah, it's uncleared problem.
However, discussing more about this issue is not appropriate in this list.
Regards,
-- GOTO Masanori