This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
RE: [RFC-v2] Allow explicit 16 or 32 char in 'x /s'
- From: "Pierre Muller" <pierre dot muller at ics-cnrs dot unistra dot fr>
- To: "'Eli Zaretskii'" <eliz at gnu dot org>
- Cc: <tromey at redhat dot com>, <gdb-patches at sourceware dot org>
- Date: Thu, 1 Apr 2010 11:34:00 +0200
- Subject: RE: [RFC-v2] Allow explicit 16 or 32 char in 'x /s'
- References: <11484.4708740295$1268865815@news.gmane.org> <m3mxy5z3j8.fsf@fleche.redhat.com> <83r5ngix6d.fsf@gnu.org> <15103.6087111153$1269298497@news.gmane.org> <m3r5n1v9c0.fsf@fleche.redhat.com> <006101cad0ec$cb7915d0$626b4170$%muller@ics-cnrs.unistra.fr> <83tyrwxy72.fsf@gnu.org>
> -----Message d'origine-----
> De?: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> > +the unit size defaults to @samp{b}, unless it is explicitly given.
> > +Use @kbd{x /hs} to display 16-bit char strings and @kbd{x /ws} to
> display
> > +32-bit strings. The next use of @kbd{x /s} will again display 8-bit
> > strings.
>
> This is okay, but I still think we should mention that the encoding is
> UTF-16 and UCS-4, respectively, and that it cannot be changed.
According to c_emit_char function, it is
UTF-16 (LE or BE depending on target endianess)
or UTF-32 (LE or BE also).
Is UCS-4 exactly the same as UTF-32?
Furthermore, this is c_emit_char, which means that this
is a language specific output.
Several languages have their own emit_char functions,
several of them start by a
c &= 0xFF;
line, which discards higher bytes of the character value.
(found in f-lang.c:86, m2-lang.c:45, objc-lang.c:287 and p-lang.c:161)
Of course these implementations would benefit from
using the more up to date c-lang.c implementation, but that is another
story.
This means that UTF-16 and UTF-32 will only be used
for c, cplus, assembler, minimal.
Java language seems to use another scheme to represent
extended characters: it uses
fprintf_unfiltered (stream, "\\u%.4x", (unsigned int) c);
To summarize, I don't think that saying that ' /hs' uses UTF-16
without specifying that this is language specific is correct.
Should I just mention that the output is language dependent
and uses UTF-16 or UTF-32 for c, cplus, assembler and minimal languages?
Pierre Muller