This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Displaying Unicode

To: Eli Zaretskii <eliz at is dot elta dot co dot il>
Subject: Displaying Unicode
From: bstell at ix dot netcom dot com
Date: Sat, 11 Nov 2000 10:50:30 -0800
CC: bstell at netscape dot com, gdb at sources dot redhat dot com
References: <3A0B4885.B2384AE7@netscape.com> <200011101039.FAA29785@indy.delorie.com> <3A0C2791.A33BE9AC@ix.netcom.com> <200011102009.PAA00126@indy.delorie.com> <3A0C8E19.A8A6D355@netscape.com> <200011110732.CAA00523@indy.delorie.com>

[This came from a different thread: ]
[Was: extending Gdb to display app specific data)]

Eli Zaretskii wrote:
> Could you please post a list of the types which you'd like to be able
> to display, and tell how each one of these types should look on the
> screen when displayed by GDB?

Lets look at Unicode.

UCS-2
=============
Each character is 16 bits which allows programs to (more 
easily) handle languages with more than 255 characters 
(eg: Japanese). The ascii values are basically extended 
to 16 bits. For example "A" which is 0x41 in ascii would 
be 0x0041 in UCS-2. 

The simplest UCS-2 display would be to display the ascii 
as ascii and the non-ascii as hex. This way the non 
internalization (i18n) engineers get to continue to see
the strings as before. The i18n engineers will have to
look up the values (which they often need to do anyway).

A more sophisticated display routine would check if the 
non ascii was displayable in the current locale and if so
would convert it to the current locale encoding for
display. This way developers that can read Japanese, etc.
can see (hopefully something close to) the intended
text.

To my knowledge there is not a universally used type for 
UCS-2 but we could require that for Gdb display the app 
use "UCS2 *".

UTF-16
========
For purposes of this discussion it is the same as UCS-2.
UTF-16 supports greater than 64K characters by allowing 
an extended value to be composed of two 16 bit characters.
Display the same as UCS-2.

To my knowledge there is not a universally used type for 
UTF-16 but we could require that for Gdb display the app 
use "UTF16 *".

UTF-8
=========
An alternate (8 bit multibyte) encoding, Popular because
UTF-8 does not have any 0 bytes in the data stream. For 
display convert it to UCS-2 and then display the UCS-2.

To my knowledge there is not a universally used type for 
UTF-8 but we could require that for Gdb display the app 
use "UTF8 *".

Here is a UCS-2 display routine I use:
==============================================
void
dump_UCharString(const UChar *uChar_str)
{
  while (*uChar_str) {
    if (*uChar_str < 0x7F) {
      printf("%c", *uChar_str);
    }
    else {
      printf("\\x%02x%02x ", (*uChar_str)>>8, (*uChar_str)&0xFF);
    }
    uChar_str++;
  }
}

Follow-Ups:
- Re: Displaying Unicode
  - From: Eli Zaretskii

References:
- extending Gdb to display app specific data
  - From: Brian Stell
- Re: extending Gdb to display app specific data
  - From: Eli Zaretskii
- Re: extending Gdb to display app specific data
  - From: bstell
- Re: extending Gdb to display app specific data
  - From: Eli Zaretskii
- Re: extending Gdb to display app specific data
  - From: Brian Stell
- Re: extending Gdb to display app specific data
  - From: Eli Zaretskii

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]