This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [RFC-v2] Allow explicit 16 or 32 char in 'x /s'

From: "Pierre Muller" <pierre dot muller at ics-cnrs dot unistra dot fr>
To: "'Eli Zaretskii'" <eliz at gnu dot org>
Cc: <tromey at redhat dot com>, <gdb-patches at sourceware dot org>
Date: Thu, 1 Apr 2010 11:34:00 +0200
Subject: RE: [RFC-v2] Allow explicit 16 or 32 char in 'x /s'
References: <11484.4708740295$1268865815@news.gmane.org> <m3mxy5z3j8.fsf@fleche.redhat.com> <83r5ngix6d.fsf@gnu.org> <15103.6087111153$1269298497@news.gmane.org> <m3r5n1v9c0.fsf@fleche.redhat.com> <006101cad0ec$cb7915d0$626b4170$%muller@ics-cnrs.unistra.fr> <83tyrwxy72.fsf@gnu.org>


> -----Message d'origine-----
> De?: gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> > +the unit size defaults to @samp{b}, unless it is explicitly given.
> > +Use @kbd{x /hs} to display 16-bit char strings and @kbd{x /ws} to
> display
> > +32-bit strings.  The next use of @kbd{x /s} will again display 8-bit
> > strings.
> 
> This is okay, but I still think we should mention that the encoding is
> UTF-16 and UCS-4, respectively, and that it cannot be changed.


   According to c_emit_char function, it is 
UTF-16 (LE or BE depending on target endianess)
or UTF-32 (LE or BE also).
  Is UCS-4 exactly the same as UTF-32?
  Furthermore, this is c_emit_char, which means that this
is a language specific output.
  Several languages have their own emit_char functions,
several of them start by a 
  c &= 0xFF;
line, which discards higher bytes of the character value.
(found in f-lang.c:86, m2-lang.c:45, objc-lang.c:287 and p-lang.c:161)
Of course these implementations would benefit from 
using the more up to date c-lang.c implementation, but that is another
story.

  This means that UTF-16 and UTF-32 will only be used
for c, cplus, assembler, minimal. 
  Java language seems to use another scheme to represent 
extended characters: it uses 
  fprintf_unfiltered (stream, "\\u%.4x", (unsigned int) c);

  To summarize, I don't think that saying that ' /hs'  uses UTF-16
without specifying that this is language specific is correct.

  Should I just mention that the output is language dependent
and uses UTF-16 or UTF-32 for c, cplus, assembler and minimal languages?

Pierre Muller

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]