This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
a more exhaustive iconv check
- To: libc-alpha at sources dot redhat dot com
- Subject: a more exhaustive iconv check
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 4 Sep 2000 14:47:08 +0200 (CEST)
Hi,
Up to now the gconv module for a particular encoding and its charmap have
had all freedom to disagree, and they often did, because they come from
different sources. But it does not make sense if the locale tables (created
using the charmap) and the runtime conversion (using the gconv module)
disagree.
Therefore here is a new test that verifies that the charmap and iconv
(in the charset to unicode direction) agree. The reverse iconv direction
must generally agree as well, except for a few limited and known cases,
which can be stored in CHARSET.irreversible files.
This patch uncovers a few bugs which are fixed in the next mails. If you
don't put in one of these fixes, you have to comment out the corresponding
line in iconvdata/tst-tables.sh.
Bruno
New files to be "chmod a+x" before commit:
iconvdata/tst-tables.sh
iconvdata/tst-table.sh
iconvdata/tst-table-charmap.sh
2000-09-03 Bruno Haible <haible@clisp.cons.org>
* iconvdata/tst-tables.sh: New file.
* iconvdata/tst-table.sh: New file.
* iconvdata/tst-table-from.c: New file.
* iconvdata/tst-table-to.c: New file.
* iconvdata/tst-table-charmap.sh: New file.
* iconvdata/Makefile (test-srcs): Set to tst-table-from tst-table-to.
(distribute): Add tst-tables.sh, tst-table.sh, tst-table-charmap.sh,
tst-table-from.c, tst-table-to.c, EUC-JP.irreversible,
ISIRI-3342.irreversible, SJIS.irreversible.
(tests): Add dependency on tst-tables.out.
(tst-tables.out, tst-tables-clean): New rules.
(do-tests-clean, common-mostlyclean): Require tst-tables-clean.
* iconvdata/ISIRI-3342.irreversible: New file.
* iconvdata/EUC-JP.irreversible: New file.
* iconvdata/SJIS.irreversible: New file.
*** glibc-20000831/iconvdata/tst-tables.sh.bak Sun Sep 3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-tables.sh Sun Sep 3 15:51:47 2000
***************
*** 0 ****
--- 1,213 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB. If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+
+ # Checks that the iconv() implementation (in both directions) for the
+ # stateless encodings agrees with the corresponding charmap table.
+
+ common_objpfx=$1
+ objpfx=$2
+
+ status=0
+
+ cat <<EOF |
+ # Single-byte and other "small" encodings come here.
+ # Keep this list in the same order as gconv-modules.
+ #
+ # charset name table name comment
+ ASCII ANSI_X3.4-1968
+ ISO646-GB BS_4730
+ ISO646-CA CSA_Z243.4-1985-1
+ ISO646-CA2 CSA_Z243.4-1985-2
+ ISO646-DE DIN_66003
+ ISO646-DK DS_2089
+ ISO646-ES ES
+ ISO646-ES2 ES2
+ ISO646-CN GB_1988-80
+ ISO646-IT IT
+ ISO646-JP JIS_C6220-1969-RO
+ ISO646-JP-OCR-B JIS_C6229-1984-B
+ ISO646-YU JUS_I.B1.002
+ ISO646-KR KSC5636
+ ISO646-HU MSZ_7795.3
+ ISO646-CU NC_NC00-10
+ ISO646-FR NF_Z_62-010
+ ISO646-FR1 NF_Z_62-010_1973
+ ISO646-NO NS_4551-1
+ ISO646-NO2 NS_4551-2
+ ISO646-PT PT
+ ISO646-PT2 PT2
+ ISO646-SE SEN_850200_B
+ ISO646-SE2 SEN_850200_C
+ ISO-8859-1
+ ISO-8859-2
+ ISO-8859-3
+ ISO-8859-4
+ ISO-8859-5
+ ISO-8859-6
+ ISO-8859-7
+ ISO-8859-8
+ ISO-8859-9
+ ISO-8859-10
+ #ISO-8859-11 No corresponding table, nonstandard
+ ISO-8859-13
+ ISO-8859-14
+ ISO-8859-15
+ ISO-8859-16
+ T.61-8BIT
+ ISO_6937
+ #ISO_6937-2 ISO-IR-90 Handling of combining marks is broken
+ KOI-8
+ KOI8-R
+ LATIN-GREEK
+ LATIN-GREEK-1
+ HP-ROMAN8
+ EBCDIC-AT-DE
+ EBCDIC-AT-DE-A
+ EBCDIC-CA-FR
+ EBCDIC-DK-NO
+ EBCDIC-DK-NO-A
+ EBCDIC-ES
+ EBCDIC-ES-A
+ EBCDIC-ES-S
+ EBCDIC-FI-SE
+ EBCDIC-FI-SE-A
+ EBCDIC-FR
+ EBCDIC-IS-FRISS
+ EBCDIC-IT
+ EBCDIC-PT
+ EBCDIC-UK
+ EBCDIC-US
+ IBM037
+ IBM038
+ IBM256
+ IBM273
+ IBM274
+ IBM275
+ IBM277
+ IBM278
+ IBM280
+ IBM281
+ IBM284
+ IBM285
+ IBM290
+ IBM297
+ IBM420
+ IBM423
+ IBM424
+ IBM437
+ IBM500
+ IBM850
+ IBM851
+ IBM852
+ IBM855
+ IBM857
+ IBM860
+ IBM861
+ IBM862
+ IBM863
+ IBM864
+ IBM865
+ IBM866
+ IBM868
+ IBM869
+ IBM870
+ IBM871
+ IBM875
+ IBM880
+ IBM891
+ IBM903
+ IBM904
+ IBM905
+ IBM918
+ IBM1004
+ IBM1026
+ IBM1047
+ CP1250
+ CP1251
+ CP1252
+ CP1253
+ CP1254
+ CP1255
+ CP1256
+ CP1257
+ CP1258
+ IBM874
+ CP737
+ CP775
+ MACINTOSH
+ IEC_P27-1
+ ASMO_449
+ ISO-IR-99 ANSI_X3.110-1983
+ ISO-IR-139 CSN_369103
+ CWI
+ DEC-MCS
+ ECMA-CYRILLIC
+ ISO-IR-153 GOST_19768-74
+ GREEK-CCITT
+ GREEK7
+ GREEK7-OLD
+ INIS
+ INIS-8
+ INIS-CYRILLIC
+ ISO_2033 ISO_2033-1983
+ ISO_5427
+ ISO_5427-EXT
+ #ISO_5428 Handling of combining marks is broken
+ ISO_10367-BOX
+ MAC-IS
+ MAC-UK
+ NATS-DANO
+ NATS-SEFI
+ WIN-SAMI-2 SAMI-WS2
+ ISO-IR-197
+ TIS-620
+ KOI8-U
+ ISIRI-3342
+ #
+ # Multibyte encodings come here
+ #
+ SJIS
+ #EUC-KR Charmap contains extraneous entries
+ CP949
+ #JOHAB No charmap exists
+ BIG5
+ #BIG5HKSCS Broken, please fix it
+ EUC-JP
+ EUC-CN GB2312
+ #GBK Converter uses private area characters
+ EUC-TW
+ #GB18030 Broken, please fix it
+ #
+ # Stateful encodings not testable this way
+ #
+ #ISO-2022-JP
+ #ISO-2022-JP-2
+ #ISO-2022-KR
+ #ISO-2022-CN
+ #
+ EOF
+ while read charset charmap; do
+ case ${charset} in \#*) continue;; esac
+ echo "Testing ${charset}" 1>&2
+ ./tst-table.sh ${common_objpfx} ${objpfx} ${charset} ${charmap} \
+ || { echo "failed: ./tst-table.sh ${common_objpfx} ${objpfx} ${charset} ${charmap}"; status=1; }
+ done
+
+ exit $status
*** glibc-20000831/iconvdata/tst-table.sh.bak Sun Sep 3 01:00:10 2000
--- glibc-20000831/iconvdata/tst-table.sh Sun Sep 3 15:46:49 2000
***************
*** 0 ****
--- 1,75 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB. If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+
+ # Checks that the iconv() implementation (in both directions) for a
+ # stateless encoding agrees with the charmap table.
+
+ common_objpfx=$1
+ objpfx=$2
+ charset=$3
+ charmap=$4
+
+ GCONV_PATH=${common_objpfx}iconvdata
+ export GCONV_PATH
+ LC_ALL=C
+ export LC_ALL
+
+ set -e
+
+ # Get the charmap.
+ ./tst-table-charmap.sh ${charmap:-$charset} \
+ < ../localedata/charmaps/${charmap:-$charset} \
+ > ${objpfx}tst-${charset}.charmap.table
+
+ # Precompute expected differences between the two iconv directions.
+ if test ${charset} = EUC-TW; then
+ irreversible=${objpfx}tst-${charset}.irreversible
+ grep '^0x8EA1' ${objpfx}tst-${charset}.charmap.table > ${irreversible}
+ else
+ irreversible=${charset}.irreversible
+ fi
+
+ # iconv in one direction.
+ ${common_objpfx}elf/ld.so --library-path $common_objpfx \
+ ${objpfx}tst-table-from ${charset} \
+ > ${objpfx}tst-${charset}.table
+
+ # iconv in the other direction.
+ ${common_objpfx}elf/ld.so --library-path $common_objpfx \
+ ${objpfx}tst-table-to ${charset} | sort \
+ > ${objpfx}tst-${charset}.inverse.table
+
+ # Difference between the two iconv directions.
+ diff ${objpfx}tst-${charset}.table ${objpfx}tst-${charset}.inverse.table | \
+ grep '^[<>]' | sed -e 's,^. ,,' > ${objpfx}tst-${charset}.irreversible.table
+
+ # Check 1: charmap and iconv forward should be identical.
+ cmp -s ${objpfx}tst-${charset}.charmap.table ${objpfx}tst-${charset}.table
+
+ # Check 2: the difference between the two iconv directions.
+ if test -f ${irreversible}; then
+ cat ${objpfx}tst-${charset}.charmap.table ${irreversible} | sort | uniq -u \
+ > ${objpfx}tst-${charset}.tmp.table
+ cmp -s ${objpfx}tst-${charset}.tmp.table ${objpfx}tst-${charset}.inverse.table
+ else
+ cmp -s ${objpfx}tst-${charset}.table ${objpfx}tst-${charset}.inverse.table
+ fi
+
+ exit 0
*** glibc-20000831/iconvdata/tst-table-from.c.bak Sun Sep 3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-from.c Sun Sep 3 02:49:14 2000
***************
*** 0 ****
--- 1,225 ----
+ /* Copyright (C) 2000 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Library General Public License as
+ published by the Free Software Foundation; either version 2 of the
+ License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Library General Public License for more details.
+
+ You should have received a copy of the GNU Library General Public
+ License along with the GNU C Library; see the file COPYING.LIB. If not,
+ write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ Boston, MA 02111-1307, USA. */
+
+ /* Create a table from CHARSET to Unicode.
+ This is a good test for CHARSET's iconv() module, in particular the
+ FROM_LOOP BODY macro. */
+
+ #include <stddef.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <iconv.h>
+ #include <errno.h>
+
+ /* Converts a byte buffer to a hexadecimal string. */
+ static const char*
+ hexbuf (unsigned char buf[], unsigned int buflen)
+ {
+ static char msg[50];
+
+ switch (buflen)
+ {
+ case 1:
+ sprintf (msg, "0x%02X", buf[0]);
+ break;
+ case 2:
+ sprintf (msg, "0x%02X%02X", buf[0], buf[1]);
+ break;
+ case 3:
+ sprintf (msg, "0x%02X%02X%02X", buf[0], buf[1], buf[2]);
+ break;
+ case 4:
+ sprintf (msg, "0x%02X%02X%02X%02X", buf[0], buf[1], buf[2], buf[3]);
+ break;
+ default:
+ abort ();
+ }
+ return msg;
+ }
+
+ /* Attempts to convert a byte buffer BUF (BUFLEN bytes) to OUT (6 bytes)
+ using the conversion descriptor CD. Returns the number of written bytes,
+ or 0 if ambiguous, or -1 if invalid. */
+ static int
+ try (iconv_t cd, unsigned char buf[], unsigned int buflen, unsigned char *out)
+ {
+ const char *inbuf = (const char *) buf;
+ size_t inbytesleft = buflen;
+ char *outbuf = (char *) out;
+ size_t outbytesleft = 6;
+ size_t result = iconv (cd,
+ (char *) &inbuf, &inbytesleft,
+ &outbuf, &outbytesleft);
+ if (result == (size_t)(-1))
+ {
+ if (errno == EILSEQ)
+ {
+ return -1;
+ }
+ else if (errno == EINVAL)
+ {
+ return 0;
+ }
+ else
+ {
+ int saved_errno = errno;
+ fprintf (stderr, "%s: iconv error: ", hexbuf (buf, buflen));
+ errno = saved_errno;
+ perror ("");
+ exit (1);
+ }
+ }
+ else
+ {
+ if (inbytesleft != 0)
+ {
+ fprintf (stderr, "%s: inbytes = %ld, outbytes = %ld\n",
+ hexbuf (buf, buflen),
+ (long) (buflen - inbytesleft),
+ (long) (6 - outbytesleft));
+ exit (1);
+ }
+ return 6 - outbytesleft;
+ }
+ }
+
+ /* Returns the out[] buffer as a Unicode value. */
+ static unsigned int
+ utf8_decode (const unsigned char *out, unsigned int outlen)
+ {
+ return (outlen==1 ? out[0] :
+ outlen==2 ? ((out[0] & 0x1f) << 6) + (out[1] & 0x3f) :
+ outlen==3 ? ((out[0] & 0x0f) << 12) + ((out[1] & 0x3f) << 6) + (out[2] & 0x3f) :
+ outlen==4 ? ((out[0] & 0x07) << 18) + ((out[1] & 0x3f) << 12) + ((out[2] & 0x3f) << 6) + (out[3] & 0x3f) :
+ outlen==5 ? ((out[0] & 0x03) << 24) + ((out[1] & 0x3f) << 18) + ((out[2] & 0x3f) << 12) + ((out[3] & 0x3f) << 6) + (out[4] & 0x3f) :
+ outlen==6 ? ((out[0] & 0x01) << 30) + ((out[1] & 0x3f) << 24) + ((out[2] & 0x3f) << 18) + ((out[3] & 0x3f) << 12) + ((out[4] & 0x3f) << 6) + (out[5] & 0x3f) :
+ 0xfffd);
+ }
+
+ int
+ main (int argc, char *argv[])
+ {
+ const char *charset;
+ iconv_t cd;
+
+ if (argc != 2)
+ {
+ fprintf (stderr, "Usage: tst-table-to charset\n");
+ exit (1);
+ }
+ charset = argv[1];
+
+ cd = iconv_open ("UTF-8", charset);
+ if (cd == (iconv_t)(-1))
+ {
+ perror ("iconv_open");
+ exit (1);
+ }
+
+ {
+ unsigned char out[6];
+ unsigned char buf[4];
+ unsigned int i0, i1, i2, i3;
+ int result;
+
+ for (i0 = 0; i0 < 0x100; i0++)
+ {
+ buf[0] = i0;
+ result = try (cd, buf, 1, out);
+ if (result < 0)
+ {
+ }
+ else if (result > 0)
+ {
+ printf ("0x%02X\t0x%04X\n",
+ i0, utf8_decode (out, result));
+ }
+ else
+ {
+ for (i1 = 0; i1 < 0x100; i1++)
+ {
+ buf[1] = i1;
+ result = try (cd, buf, 2, out);
+ if (result < 0)
+ {
+ }
+ else if (result > 0)
+ {
+ printf ("0x%02X%02X\t0x%04X\n",
+ i0, i1, utf8_decode (out, result));
+ }
+ else
+ {
+ for (i2 = 0; i2 < 0x100; i2++)
+ {
+ buf[2] = i2;
+ result = try (cd, buf, 3, out);
+ if (result < 0)
+ {
+ }
+ else if (result > 0)
+ {
+ printf ("0x%02X%02X%02X\t0x%04X\n",
+ i0, i1, i2, utf8_decode (out, result));
+ }
+ else if (strcmp (charset, "UTF-8"))
+ {
+ for (i3 = 0; i3 < 0x100; i3++)
+ {
+ buf[3] = i3;
+ result = try (cd, buf, 4, out);
+ if (result < 0)
+ {
+ }
+ else if (result > 0)
+ {
+ printf ("0x%02X%02X%02X%02X\t0x%04X\n",
+ i0, i1, i2, i3,
+ utf8_decode (out, result));
+ }
+ else
+ {
+ fprintf (stderr,
+ "%s: incomplete byte sequence\n",
+ hexbuf (buf, 4));
+ exit (1);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ if (iconv_close (cd) < 0)
+ {
+ perror ("iconv_close");
+ exit (1);
+ }
+
+ if (ferror (stdin) || ferror (stdout))
+ {
+ fprintf (stderr, "I/O error\n");
+ exit (1);
+ }
+
+ exit (0);
+ }
*** glibc-20000831/iconvdata/tst-table-to.c.bak Sun Sep 3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-to.c Sun Sep 3 02:48:44 2000
***************
*** 0 ****
--- 1,107 ----
+ /* Copyright (C) 2000 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Library General Public License as
+ published by the Free Software Foundation; either version 2 of the
+ License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Library General Public License for more details.
+
+ You should have received a copy of the GNU Library General Public
+ License along with the GNU C Library; see the file COPYING.LIB. If not,
+ write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ Boston, MA 02111-1307, USA. */
+
+ /* Create a table from Unicode to CHARSET.
+ This is a good test for CHARSET's iconv() module, in particular the
+ TO_LOOP BODY macro. */
+
+ #include <stddef.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <iconv.h>
+ #include <errno.h>
+
+ int
+ main (int argc, char *argv[])
+ {
+ const char *charset;
+ iconv_t cd;
+
+ if (argc != 2)
+ {
+ fprintf (stderr, "Usage: tst-table-to charset\n");
+ exit (1);
+ }
+ charset = argv[1];
+
+ cd = iconv_open (charset, "UCS-2");
+ if (cd == (iconv_t)(-1))
+ {
+ perror ("iconv_open");
+ exit (1);
+ }
+
+ {
+ unsigned int i;
+ unsigned char buf[10];
+
+ for (i = 0; i < 0x10000; i++)
+ {
+ unsigned short in = i;
+ const char *inbuf = (const char *) ∈
+ size_t inbytesleft = sizeof (unsigned short);
+ char *outbuf = (char *) buf;
+ size_t outbytesleft = sizeof (buf);
+ size_t result = iconv (cd,
+ (char *) &inbuf, &inbytesleft,
+ &outbuf, &outbytesleft);
+ if (result == (size_t)(-1))
+ {
+ if (errno != EILSEQ)
+ {
+ int saved_errno = errno;
+ fprintf (stderr, "0x%02X: iconv error: ", i);
+ errno = saved_errno;
+ perror ("");
+ exit (1);
+ }
+ }
+ else if (result == 0) /* ignore conversions with transliteration */
+ {
+ unsigned int j, jmax;
+ if (inbytesleft != 0 || outbytesleft == sizeof (buf))
+ {
+ fprintf (stderr, "0x%02X: inbytes = %ld, outbytes = %ld\n", i,
+ (long) (sizeof (unsigned short) - inbytesleft),
+ (long) (sizeof (buf) - outbytesleft));
+ exit (1);
+ }
+ jmax = sizeof (buf) - outbytesleft;
+ printf ("0x");
+ for (j = 0; j < jmax; j++)
+ printf ("%02X", buf[j]);
+ printf ("\t0x%04X\n", i);
+ }
+ }
+ }
+
+ if (iconv_close (cd) < 0)
+ {
+ perror ("iconv_close");
+ exit (1);
+ }
+
+ if (ferror (stdin) || ferror (stdout))
+ {
+ fprintf (stderr, "I/O error\n");
+ exit (1);
+ }
+
+ exit (0);
+ }
*** glibc-20000831/iconvdata/tst-table-charmap.sh.bak Sun Sep 3 00:19:30 2000
--- glibc-20000831/iconvdata/tst-table-charmap.sh Sun Sep 3 12:00:04 2000
***************
*** 0 ****
--- 1,35 ----
+ #!/bin/sh
+ # Copyright (C) 2000 Free Software Foundation, Inc.
+ # This file is part of the GNU C Library.
+ # Contributed by Bruno Haible <haible@clisp.cons.org>, 2000.
+ #
+ # The GNU C Library is free software; you can redistribute it and/or
+ # modify it under the terms of the GNU Library General Public License as
+ # published by the Free Software Foundation; either version 2 of the
+ # License, or (at your option) any later version.
+ #
+ # The GNU C Library is distributed in the hope that it will be useful,
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ # Library General Public License for more details.
+ #
+ # You should have received a copy of the GNU Library General Public
+ # License along with the GNU C Library; see the file COPYING.LIB. If not,
+ # write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ # Boston, MA 02111-1307, USA.
+
+ # Converts a glibc format charmap to a simple format .table file.
+
+ LC_ALL=C
+ export LC_ALL
+
+ case "$1" in
+ POSIX )
+ # Old POSIX/DKUUG borrowed format
+ grep '^<.*>.*/x[0-9A-Fa-f]*[ ]*<U....>.*$' | grep -v 'not a real character' | sed -e 's,^<.*>[ ]*\([/x0-9A-Fa-f]*\)[ ]*<U\(....\)>.*$,\1 0x\2,' | tr abcdef ABCDEF | sed -e 's,/x\([0-9A-F][0-9A-F]\),\1,g' | sed -e 's,^,0x,' | sort | uniq | grep -v '^0x00 0x\([1-9A-F]...\|.[1-9A-F]..\|..[1-9A-F].\|...[1-9A-F]\)'
+ ;;
+ *)
+ # New Unicode based format
+ sed -e 's,^%IRREVERSIBLE%,,' | grep '^<U....>[ ]*/x' | grep -v 'not a real character' | sed -e 's,<U\(....\)>[ ]*\([/x0-9A-Fa-f]*\).*$,\2 0x\1,' | tr abcdef ABCDEF | sed -e 's,/x\([0-9A-F][0-9A-F]\),\1,g' | sed -e 's,^,0x,' | sort | uniq | grep -v '^0x00 0x\([1-9A-F]...\|.[1-9A-F]..\|..[1-9A-F].\|...[1-9A-F]\)'
+ ;;
+ esac
*** glibc-20000831/iconvdata/Makefile.bak Wed Aug 30 23:43:37 2000
--- glibc-20000831/iconvdata/Makefile Sun Sep 3 16:31:27 2000
***************
*** 51,56 ****
--- 51,58 ----
tests = bug-iconv1 bug-iconv2
+ test-srcs := tst-table-from tst-table-to
+
include ../Makeconfig
libJIS-routines := jis0201 jis0208 jis0212
***************
*** 89,95 ****
distribute := gconv-modules extra-module.mk gap.awk gaptab.awk \
gen-8bit.sh gen-8bit-gap.sh gen-8bit-gap-1.sh \
TESTS $(filter-out testdata/CVS%, $(wildcard testdata/*)) \
! run-iconv-test.sh 8bit-generic.c 8bit-gap.c \
ansi_x3.110.c asmo_449.c big5.c cp737.c cp737.h \
cp775.c cp775.h ibm874.c cns11643.c cns11643.h \
cns11643l1.c cns11643l1.h cp1250.c cp1251.c cp1252.c cp1253.c \
--- 91,100 ----
distribute := gconv-modules extra-module.mk gap.awk gaptab.awk \
gen-8bit.sh gen-8bit-gap.sh gen-8bit-gap-1.sh \
TESTS $(filter-out testdata/CVS%, $(wildcard testdata/*)) \
! run-iconv-test.sh tst-tables.sh tst-table.sh \
! tst-table-charmap.sh tst-table-from.c tst-table-to.c \
! EUC-JP.irreversible ISIRI-3342.irreversible SJIS.irreversible \
! 8bit-generic.c 8bit-gap.c \
ansi_x3.110.c asmo_449.c big5.c cp737.c cp737.h \
cp775.c cp775.h ibm874.c cns11643.c cns11643.h \
cns11643l1.c cns11643l1.h cp1250.c cp1251.c cp1252.c cp1253.c \
***************
*** 244,250 ****
ifeq (no,$(cross-compiling))
ifeq (yes,$(build-shared))
! tests: $(objpfx)iconv-test.out
endif
endif
--- 249,255 ----
ifeq (no,$(cross-compiling))
ifeq (yes,$(build-shared))
! tests: $(objpfx)iconv-test.out $(objpfx)tst-tables.out
endif
endif
***************
*** 254,259 ****
--- 259,275 ----
$(addprefix $(objpfx),$(modules.so)) \
$(common-objdir)/iconv/iconv_prog TESTS
$(SHELL) -e $< $(common-objdir) > $@
+
+ $(objpfx)tst-tables.out: tst-tables.sh $(objpfx)gconv-modules \
+ $(addprefix $(objpfx),$(modules.so)) \
+ $(objpfx)tst-table-from $(objpfx)tst-table-to
+ $(SHELL) $< $(common-objpfx) $(common-objpfx)iconvdata/ > $@
+
+ do-tests-clean common-mostlyclean: tst-tables-clean
+
+ .PHONY: tst-tables-clean
+ tst-tables-clean:
+ -rm -f $(objpfx)tst-*.table $(objpfx)tst-EUC-TW.irreversible
ifdef objpfx
$(objpfx)gconv-modules: gconv-modules
*** glibc-20000831/iconvdata/ISIRI-3342.irreversible.bak Sun Sep 3 03:51:34 2000
--- glibc-20000831/iconvdata/ISIRI-3342.irreversible Sun Sep 3 03:50:02 2000
***************
*** 0 ****
--- 1,52 ----
+ 0x80 0x0000
+ 0x81 0x0001
+ 0x82 0x0002
+ 0x83 0x0003
+ 0x84 0x0004
+ 0x85 0x0005
+ 0x86 0x0006
+ 0x87 0x0007
+ 0x88 0x0008
+ 0x89 0x0009
+ 0x8A 0x000A
+ 0x8B 0x000B
+ 0x8C 0x000C
+ 0x8D 0x000D
+ 0x8E 0x000E
+ 0x8F 0x000F
+ 0x90 0x0010
+ 0x91 0x0011
+ 0x92 0x0012
+ 0x93 0x0013
+ 0x94 0x0014
+ 0x95 0x0015
+ 0x96 0x0016
+ 0x97 0x0017
+ 0x98 0x0018
+ 0x99 0x0019
+ 0x9A 0x001A
+ 0x9B 0x001B
+ 0x9C 0x001C
+ 0x9D 0x001D
+ 0x9E 0x001E
+ 0x9F 0x001F
+ 0xA0 0x0020
+ 0xA3 0x0021
+ 0xA6 0x002E
+ 0xA8 0x0029
+ 0xA9 0x0028
+ 0xAB 0x002B
+ 0xAD 0x002D
+ 0xAF 0x002F
+ 0xBA 0x003A
+ 0xBC 0x003C
+ 0xBD 0x003D
+ 0xBE 0x003E
+ 0xE2 0x005D
+ 0xE3 0x005B
+ 0xE4 0x007D
+ 0xE5 0x007B
+ 0xE8 0x002A
+ 0xEA 0x007C
+ 0xEB 0x005C
+ 0xFF 0x007F
*** glibc-20000831/iconvdata/EUC-JP.irreversible.bak Sun Sep 3 15:35:47 2000
--- glibc-20000831/iconvdata/EUC-JP.irreversible Sun Sep 3 12:17:13 2000
***************
*** 0 ****
--- 1,6 ----
+ 0x5C 0x00A5
+ 0x7E 0x203E
+ 0x8FA2B7 0x007E
+ 0x8FA2B7 0xFF5E
+ 0xA1C0 0x005C
+ 0xA1C0 0xFF3C
*** glibc-20000831/iconvdata/SJIS.irreversible.bak Sun Sep 3 15:36:00 2000
--- glibc-20000831/iconvdata/SJIS.irreversible Sun Sep 3 04:09:56 2000
***************
*** 0 ****
--- 1,7 ----
+ 0x5C 0x005C
+ 0x7E 0x007E
+ 0x815F 0x005C
+ 0x815F 0xFF3C
+ 0x8191 0xFFE0
+ 0x8192 0xFFE1
+ 0x81CA 0xFFE2