This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug in libiconv?


On Jan 24 22:09, Charles Wilson wrote:
> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
> > Here's what happens on Cygwin:
> > 
> >   $ gcc -g -o ic ic.c -liconv
> >   $ ./ic
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkà sana>, inbuf = <à sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkà sana>, inbuf = <à sana>, inbytesleft = 7, outbytesleft = 492
> >   iconv: 138 <Invalid or incomplete multibyte or wide character>
> >   in = <Liian pitkà sana>, inbuf = <à sana>, inbytesleft = 7, outbytesleft = 492
> >   in = <Liian pitkà sana>, inbuf = <>, inbytesleft = 0, outbytesleft = 480
> 
> Confirmed.
> 
> > So, AFAICS, there are two problems:
> > 
> >   - Even though iconv_open has been opened explicitely with "UTF-8" as
> >     input string, the conversion still depends on the current application
> >     codeset.  That dsoesn't make sense.
> > 
> >   - Even though the last parameter to iconv is defined in bytes, the
> >     value of outbytesleft after the conversion is the number of remaining
> >     wchar"t's, not the number of remaining bytes.  That's contrary to what
> >     POSIX defines, see
> >     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
> > 
> > Is this analyzes correct?  Is there by any chance a newer version of
> > libiconv2 which does not have these problems?
> 
> Well, iconv's behavior is very dependent on detailed characteristics of
> the system on which it was compiled -- e.g. it's very finicky about the
> platform's behavior vis character sets.

Ok, but that doesn't mean it has to stumble over its own feet if the
current locale's codeset is different from the codeset which has to
be converted.

I found that gencat uses the return value of the nl_langinfo call
after it called setlocale, like this:

  setlocale (LC_ALL, "");
  codeset = nl_langinfo (CODESET);
  setlocale (LC_ALL, "C");
  [...]

This is plain wrong.  See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html

  "Calls to setlocale() with a category corresponding to the category of
   item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
   array pointed to by the return value."

That's what happens in newlib, but not in glibc.  Maybe that's
libiconv's problem as well?

I also found that

  iconv_close ((iconv_t) -1);

crashes the application with a SEGV.  It's clearly the fault of the
application, but it doesn't deserve a SEGV, imho.

FYI, I examined the libiconv sources cursorily, and I found a couple of
code snippets with Cygwin-specific code which is rather questionable.

- Why on earth is libiconv on Cygwin using Windows functions in some
  places?

  - libcharset/lib/relocatable.c
  - srclib/progreloc.c
  - srclib/relocatable.c
  - lib/relocatable.c

- libcharset/lib/relocatable.c and srclib/relocatable.c define their own
  DllMain and use Windows functions.  And the old
  cygwin_conv_to_posix_path function as well.


- The usage of a fixed table instaed of the charset.alias file in
  libcharset/lib/localcharset.c, function get_charset_aliases() is
  not good, not good at all.

- Same file, function locale_charset() contains old Cygwin-specific
  code which is outdated.  AFAICS it shouldn't hurt, though, since
  Cygwin no longer returns "US-ASCII".

- lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
  ei_ucs2internal encoding table.  I'm not sure if that's right or
  wrong, but it looks worrying.  Please note that I defined
  __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is missing
  since 1.7.2.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]