This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

bug in iconv's -c and //IGNORE handling



Hi,

POSIX says about the iconv -c option:

     "Omit any invalid characters from the output. When -c is not used,
      the results of encountering invalid characters in the input stream
      (either those that are not valid members of the fromcode or those
      that have no corresponding value in tocode) shall be specified in
      the system documentation. The presence or absence of -c shall not
      affect the exit status of iconv."

Currently glibc's iconv correctly omits the invalid characters but the
exit status is unaffected only if the conversion error occurs in a
TO_LOOP (i.e. for characters which have no corresponding value in tocode):

$ printf 'Ma\xde\nnehmen\n' > input
$ /usr/bin/iconv -f ISO-8859-1 -t ASCII < input
Ma/usr/bin/iconv: illegal input sequence at position 2
$ echo $?
1
$ /usr/bin/iconv -c -f ISO-8859-1 -t ASCII < input
Ma
nehmen
/usr/bin/iconv: illegal input sequence at position 11
$ echo $?
1

But the exit status is broken if the conversion error occurs in a FROM_LOOP
(i.e. for invalid members of the fromcode):

$ printf 'Ma\xde\nnehmen\n' > input
$ /usr/bin/iconv -f ASCII -t ISO-8859-1 < input
Ma/usr/bin/iconv: illegal input sequence at position 2
$ echo $?
1
$ /usr/bin/iconv -c -f ASCII -t ISO-8859-1 < input
Ma
nehmen
$ echo $?
0

This is inconsistent and violates POSIX.

The inconsistency is in the iconv() function, not in the iconv program,
as you can see from using iconv_open("ASCII//IGNORE","ISO-8859-1") and
iconv_open("ISO-8859-1//IGNORE","ASCII"). The latter fails to give EILSEQ.

Here is a patch that changes the

                if (! ignore_errors_p ())                                     \
                  {                                                           \
                    result = __GCONV_ILLEGAL_INPUT;                           \
                    break;                                                    \
                  }                                                           \
                                                                              \
                ++inptr;                                                      \
                ++*irreversible;                                              \
                continue;                                                     \

pattern found in so many source files so that
"result = __GCONV_ILLEGAL_INPUT;" is executed before testing
ignore_errors_p(). The same way as it is done in STANDARD_ERR_HANDLER.

For maintainability (123 times the same idiom!) I also move this pattern
into a macro, called STANDARD_FROM_LOOP_ERR_HANDLER.


2002-05-26  Bruno Haible  <bruno@clisp.org>

	* iconv/loop.c (STANDARD_FROM_LOOP_ERR_HANDLER): New macro.
	(STANDARD_TO_LOOP_ERR_HANDLER): Renamed from STANDARD_ERR_HANDLER.
	All callers changed.
	* iconv/gconv_simple.c (ascii_internal_loop): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(utf8_internal_loop): Likewise.
	(ucs2_internal_loop): Likewise.
	(internal_ucs2_loop): Perform error handling like in
	STANDARD_FROM_LOOP_ERR_HANDLER.
	* iconvdata/unicode.c (BODY for TO_LOOP): Perform error handling like
	in STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
	handling.
	* iconvdata/utf-16.c (BODY for TO_LOOP): Perform error handling like
	in STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
	handling.
	* iconvdata/utf-32.c (BODY for TO_LOOP): Perform error handling like
	in STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
	handling.
	* iconvdata/big5.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	* iconvdata/iso-2022-jp.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/8bit-gap.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/8bit-generic.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/ansi_x3.110.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/armscii-8.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/cp1255.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/cp1258.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/euc-cn.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/euc-jisx0213.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/euc-jp.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/euc-kr.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/euc-tw.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/big5hkscs.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/gb18030.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/gbk.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso-2022-cn.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso-2022-jp-3.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso-2022-kr.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso646.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso_6937-2.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/iso_6937.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/johab.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/shift_jisx0213.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/sjis.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/t.61.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/uhc.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/utf-7.c (BODY for FROM_LOOP): Likewise.
	* iconvdata/gbbig5.c (BODY for FROM_LOOP): Likewise. When ignoring
	an error, still set result = __GCONV_ILLEGAL_INPUT.
	(BODY for TO_LOOP): Likewise.
	* iconvdata/ibm930.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm932.c: Include <dlfcn.h> and <stdint.h>.
	(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
	handling.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm933.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm935.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm937.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm939.c (BODY for FROM_LOOP): For error handling use
	STANDARD_FROM_LOOP_ERR_HANDLER.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/ibm943.c: Include <dlfcn.h> and <stdint.h>.
	(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
	handling.
	(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
	* iconvdata/gbgbk.c (BODY for FROM_LOOP): Update.
	* iconvdata/iso8859-1.c (BODY for TO_LOOP): Update.
	* iconvdata/tcvn5712-1.c (BODY for TO_LOOP): Update.

[patch compressed for size]

Attachment: iconv-errhandling-patch.bz2
Description: iconv-errhandling-patch.bz2


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]