This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/13541] New: iconv //IGNORE charsets are inconsistent about INBUF* state after EILSEQ


http://sourceware.org/bugzilla/show_bug.cgi?id=13541

             Bug #: 13541
           Summary: iconv //IGNORE charsets are inconsistent about INBUF*
                    state after EILSEQ
           Product: glibc
           Version: 2.14
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: ezyang@mit.edu
    Classification: Unclassified


The iconv infopage says the following:

    `EILSEQ'
          The conversion stopped because of an invalid byte sequence in
          the input.  After the call, `*INBUF' points at the first byte
          of the invalid byte sequence.

However, this is clearly not the case when an //IGNORE target charset is
specified:

    #include <iconv.h>
    #include <string.h>
    #include <stdio.h>
    #include <errno.h>
    int main() {
        iconv_t i = iconv_open("ascii//IGNORE", "utf-8");
        char inbuf[10000];
        char outbuf[10000];
        char *in = inbuf;
        char *out = outbuf;
        int inleft = 10000;
        int outleft = 10000;
        int s;
        memset(inbuf, 0x77, 10000);
        inbuf[0] = 0xC2;
        inbuf[1] = 0xA2;
        s = iconv(i, &in, &inleft, &out, &outleft);
        printf("s = %d, errno = %d, in[0] = %x, inleft = %d\n", s, errno,
(unsigned char)*in, inleft);
    }

Outputs the following:

    s = -1, errno = 84, in[0] = 77, inleft = 1839

'iconv' appears to have gobbled up another ~8000 bytes after the invalid byte
sequence, before returning EILSEQ (84).

The documentation here cannot possibly correct, if we want 'IGNORE' to actually
do anything. So we have two options:

1. Claim that the semantics of EILSEQ change when the magic //IGNORE flag is
specified, and require user code to work around it properly. This is what the
'-c' flag in iconv_prog.c does, by magically "converting" these errors into
E2BIG errors, and re-running iconv appropriately.

2. Claim that the this API is wrong, and modify the API such that an iconv
operating on an //IGNORE character set *never* returns EILSEQ (what one might
expect, since IGNORE is supposed to allow us to ignore sequences that are
illegal in the target). This would make glibc's iconv implementation consistent
with libiconv's.

I favor (2), since it makes client code considerably simpler and easier to
implement correctly.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]