This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Unicode width data inconsistent/outdated

From: Thomas Wolff <towo at towo dot net>
To: cygwin at cygwin dot com
Date: Sat, 5 Aug 2017 22:53:22 +0200
Subject: Re: Unicode width data inconsistent/outdated
Authentication-results: sourceware.org; auth=none
References: <f3c1b415-7a26-8bbe-a67f-5619d356f058@towo.net> <20170726080859.GA24312@calimero.vinschen.de> <5d3cb047-49f8-26a6-d816-387a71486e99@cygwin.com> <20170726095016.GA25666@calimero.vinschen.de> <289bd98b-e644-888d-07f8-8965b6538373@towo.net> <20170728195826.GI24013@calimero.vinschen.de> <1244bd24-bb27-d185-1f24-61beae02c2cd@towo.net> <20170804170156.GL25551@calimero.vinschen.de> <30486790-c59d-9a78-6000-b3c20fb86d9d@towo.net> <1f320064-0f25-8a41-4ded-49bd750edae5@SystematicSw.ab.ca>

Am 05.08.2017 um 22:24 schrieb Brian Inglis:

On 2017-08-05 13:06, Thomas Wolff wrote:
...

Which other platforms do actually use newlib?

Many historical uPs and current uCs used in embedded systems supporting gcc not
using Linux, including RTEMS, devKits for Nintendo and Sony game systems, aome
Android, Google NaCl.

Do they all handle wchar_t to be encoded locale-specifically? I doubt that.
https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html
particularly points out Solaris and FreeBSD, no others.

Issue 3 is the special conversion jp2uc which seems to be half-bred;
there is no such handling for Chinese or Korean.

This shouldn't matter to you, just keep it in place. It's a historical, low
footprint conversion for japanese characters without pulling in the unicode
stuff. Not used on Cygwin so just ignore.

I had noticed meanwhile that this is not active in Cygwin, but it's broken
anyway for multiple reasons:
* platforms for which wchar_t is not Unicode should be explicitly listed
* if used, the transformation needs to be applied to all non-Unicode locales
(also Chinese, Korean, and even 8-bit locales such as *.CP1252)
* for towupper and towlower, the result must be back-transformed into the
respective locale encoding
* particulary the locale-specific _l functions inconsistently do not use the
transformation but have this note:

We're using a locale-independent representation of upper/lower case based
on Unicode data. Thus, the locale doesn't matter.

So I'd suggest to drop that stuff unless someone would like to fix it.

Looks like JIS support is under newlib/iconvdata

So maybe the conversion can call jisx0201_to_ucs4 etc. from there, andalso the back-conversion for towupper/lower is available.But then the stuff is still broken for the other reasons. I could mapthe _l functions properly, if that's really desired, but how to handleother encodings and on which platforms?


Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

References:
- Re: Unicode width data inconsistent/outdated
  - From: Thomas Wolff
- Re: Unicode width data inconsistent/outdated
  - From: Corinna Vinschen
- Re: Unicode width data inconsistent/outdated
  - From: Thomas Wolff
- Re: Unicode width data inconsistent/outdated
  - From: Brian Inglis

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]