This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Bug in libiconv?


On 1/25/2011 6:15 AM, Corinna Vinschen wrote:
> On Jan 24 22:09, Charles Wilson wrote:
>> On 1/24/2011 10:41 AM, Corinna Vinschen wrote:
>>> So, AFAICS, there are two problems:
>>>
>>>   - Even though iconv_open has been opened explicitely with "UTF-8" as
>>>     input string, the conversion still depends on the current application
>>>     codeset.  That dsoesn't make sense.
>>>
>>>   - Even though the last parameter to iconv is defined in bytes, the
>>>     value of outbytesleft after the conversion is the number of remaining
>>>     wchar"t's, not the number of remaining bytes.  That's contrary to what
>>>     POSIX defines, see
>>>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html
>>>
>>> Is this analyzes correct?  Is there by any chance a newer version of
>>> libiconv2 which does not have these problems?
>>
>> Well, iconv's behavior is very dependent on detailed characteristics of
>> the system on which it was compiled -- e.g. it's very finicky about the
>> platform's behavior vis character sets.
> 
> Ok, but that doesn't mean it has to stumble over its own feet if the
> current locale's codeset is different from the codeset which has to
> be converted.

True, of course. I was just thinking that *maybe* just recompiling
libiconv now that cygwin's i18n stuff has become more stable...might help.

> I found that gencat uses the return value of the nl_langinfo call
> after it called setlocale, like this:
> 
>   setlocale (LC_ALL, "");
>   codeset = nl_langinfo (CODESET);
>   setlocale (LC_ALL, "C");
>   [...]
> 
> This is plain wrong.  See
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/nl_langinfo.html
> 
>   "Calls to setlocale() with a category corresponding to the category of
>    item (see <langinfo.h>), or to the category LC_ALL , may overwrite the
>    array pointed to by the return value."
> 
> That's what happens in newlib, but not in glibc.  Maybe that's
> libiconv's problem as well?

Hmm. I'll look for that.

> I also found that
> 
>   iconv_close ((iconv_t) -1);
> 
> crashes the application with a SEGV.  It's clearly the fault of the
> application, but it doesn't deserve a SEGV, imho.

Yeah, that's bad.

> FYI, I examined the libiconv sources cursorily, and I found a couple of
> code snippets with Cygwin-specific code which is rather questionable.
> 
> - Why on earth is libiconv on Cygwin using Windows functions in some
>   places?
> 
>   - libcharset/lib/relocatable.c
>   - srclib/progreloc.c
>   - srclib/relocatable.c
>   - lib/relocatable.c

whoo boy. That's...a long story. It's all part of Bruno's magic
relocatability machinery.  However, on cygwin it should be using unixish
mechanisms (at least for exe's -- looking at /proc/$pid/exe.  For
DLLs...I think it needs to keep using the DllMain approach).

> - libcharset/lib/relocatable.c and srclib/relocatable.c define their own
>   DllMain and use Windows functions.  And the old
>   cygwin_conv_to_posix_path function as well.

Well, yes.  It's how DLLs determine their installation path, so they can
then automatically deduce the relative path to <whatever>.  And since
that requires using a win32 function (GetModuleFileName) it needs to
convert to cygwin format.  These days it ought to use the new
functions...I'll prepare a gnulib patch, and from there it will work its
way down into libiconv/gettext.  I'm not sure if the gnulib guys want to
preserve compat with 1.5 (e.g. check for cygwin_conv_path() and only use
if present, otherwise use deprecated?) or not.

> - The usage of a fixed table instaed of the charset.alias file in
>   libcharset/lib/localcharset.c, function get_charset_aliases() is
>   not good, not good at all.

Yeah, you're right.  It looks like there's been some bitrot with respect
to some of the "&& !CYGWIN" guards on WIN32.  Both libiconv and gettext,
IIRC, jump thru hoops to ensure that [_]*WIN32 is defined for both
"regular" win32 and for cygwin...which means defined(CYGWIN) guards are
necessary.

> - Same file, function locale_charset() contains old Cygwin-specific
>   code which is outdated.  AFAICS it shouldn't hurt, though, since
>   Cygwin no longer returns "US-ASCII".
> 
> - lib/iconv_open1.h and lib/iconv.c exclude Cygwin from the usage of the
>   ei_ucs2internal encoding table.  I'm not sure if that's right or
>   wrong, but it looks worrying.

Well, remember (A) upstream libiconv itself hasn't been updated since
30-Jun-2009, which predated cygwin 1.7.1 (23 Dec 2009), and (B) even our
most recent version (1.13.1-1) was released almost simultaneously (23
Dec 2009) -- and there was a LOT of shakeup in all that stuff from 1.7.1
thru 1.7.5.

> Please note that I defined
> __STDC_ISO_10646__ for Cygwin 1.7.8 yesterday.  This define is
> missing since 1.7.2.

Hmmm...maybe I should (re)build libiconv against a snapshot?

I don't routinely use extended character sets, and have to rely on the
test suites.  They passed, so...I thought good enough.  Perhaps not...

--
Chuck

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]