This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: iconv_open behaviour on EILSEQ
- From: Jungshik Shin <jshin at mailaps dot org>
- To: libc-alpha at sources dot redhat dot com
- Date: Sun, 5 May 2002 00:53:04 -0400 (EDT)
- Subject: Re: iconv_open behaviour on EILSEQ
On Sat, 4 May 2002, Andreas Schwab wrote:
> Stefan Hoffmeister <bug.glibc-gnu.org@econos.de> writes:
> |> Empirically, I can see that *outbuf and *outbytesleft have been modified
> |> to reflect the successful conversions up to the point where the
> |> character triggering EILSEQ is located.
> |> SUSv2 is completely silent about the state of anything in the presence
> |> of EILSEQ; same problem in the last publicly accessible draft of SUSv3.
>
> POSIX.1-2001 says:
>
> If a sequence of input bytes does not form a valid character in the
> specified codeset, conversion shall stop after the previous
> successfully converted character. [...] The variable pointed to by
> outbuf shall be updated to point to the byte following the last byte
> of converted output data. The value pointed to by outbytesleft shall
> be decremented to reflect the number of bytes still available in the
> output buffer. [...]
> This is pretty unambiguous, IMHO. Even in presence of errors the argument
> pointers must be updated.
Yes, it seems pretty clear. Now I have another question about
iconv()'s behavior. What is it supposed to do when it encounters a *valid*
byte sequence in the specified source codeset which cannot be converted
to the specified target codeset. For instance, what would happen if I
try to convert a UTF-8 string to one of legacy encodings with iconv()
and the UTF-8 string happens to have characters not covered by the
repertoire of the target encoding/codeset.
To borrow Stefan's expression :-), I found empirically that
iconv() returns (size_t) -1 and errno is set to EILSEQ in that case.
I also found that inbyteleft, outbyteleft, *in and *out are updated to
reflect that the conversion stopped when it came across a valid (in the
source codeset) but unconvertible (to the target codeset) byte sequence.
Is this documented in POSIX.1-2001?
Thank you in advance for any illuminating reply,
Jungshik Shin