This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/9793] New: iconv() incorrectly handles E2BIG condition by partially processing output char


Hello,

POSIX requires that iconv() stop conversion if the output buffer isn't large 
enough to hold the entire converted input and return E2BIG. iconv() should stop 
"just prior to the input bytes that would cause the output buffer to overflow." 
Please see http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html .

This is helpful behavior, since it allows the application to lengthen the 
output buffer and then resume processing from where iconv() left off.

GNU libiconv's iconv() seems to handle the E2BIG case correctly.

But glibc's iconv() does not let an application gracefully restart from E2BIG, 
because it partially converts as much of an output sequence as it can, then 
leaves the input and output pointers in an inconsistent state. In some cases 
(such as with a TRANSLIT conversion), iconv() partially advances the output 
pointer to reflect the portion of the incomplete multibyte sequence it output, 
but does not advance the input pointer.

When an application restarts conversion with a larger buffer, this leads to 
garbage in the output.

For example, when converting the UTF-8 registered trademark sign to 
ASCII//TRANSLIT, iconv() wants to write out the three-byte sequence "(R)". If 
it does not have room at the end of the buffer for this three-byte sequence, it 
should not convert the character at all, and leave the output pointer to the 
end of the successfully-converted output, and the input pointer to just prior 
to the start of the registered trademark character.

Instead, what iconv() actually does is output as much of "(R)" as it can (for 
example, "(R"), update the output pointer to reflect this partial output (e.g., 
by two bytes), but then NOT update the input pointer.

I have attached a code sample that demonstrates this behavior.

If iconv() is resumed after E2BIG, it converts the registered trademark sign 
again, leading to output like "(R(R)". The application has no way of knowing 
how many bytes prior to the output pointer are actually the partial output of 
an unsuccessfully-converter multibyte sequence.

The only workaround I have found is to keep increasing the output buffer size 
and restarting the conversion from scratch until the entire conversion works in 
one go. This is not very efficient and is not what POSIX seems to have 
intended.

I'm using the latest CVS glibc (gnu_get_libc_version() reports 2.9.90). I 
configured with --enable-add-ons=nptl --enable-kernel=2.6.24 and then added "CPPFLAGS += -fno-stack-protector" to configparms. This is on Linux 2.6.24 and 
other versions. I compiled with gcc 4.2.4 and ld 2.18.0.20080103 from Ubuntu.

Thanks for your attention to this,
Keith Winstein
keithw@mit.edu

-- 
           Summary: iconv() incorrectly handles E2BIG condition by partially
                    processing output char
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: keithw at mit dot edu
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=9793

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]