This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] fix surrogate pair handling


Hi again,

My co-worker asked me to forward a bug report and a fix to you.
He found that iconv UTF-16 modules doesn't work correctly when converting
specific surrogate pairs.  Converting from UCS-4 to UTF-16 doesn't have
any problems.

 Test case is a following:

$ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | od -bx 
0000000 330 077 337 377
        3fd8 ffdf
0000004

$ printf "\x00\x01\xff\xff" | iconv -f UCS-4BE -t UTF-16BE | iconv -f UTF-16BE -t UCS-4BE | od -bx 
iconv: illegal input sequence at position 0

 According to Unicode specification, range of high surrogate(first word)
is U+D800 through U+DBFF and range of low surrogate(last word) is
U+DC00 through U+DFFF.  However, UTF-16 module seems not to respect 
these range.

I attached more detailed test case and a fix to this mail.
How about them?

2003-02-19  Jiro Sekiba  <jir at yamato dot ibm dot com>

	* iconvdata/utf-16.c (gconv_end): Fix range of low surrogate.

Thanks,
-- 
Isamu Hasegawa
IBM Japan, Ltd.

Attachment: utf-16.patch
Description: Binary data

Attachment: utf16.c
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]