This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: [RFC] Signed/unsigned character arrays are not strings
- From: Jim Blandy <jimb at codesourcery dot com>
- To: Mark Kettenis <mark dot kettenis at xs4all dot nl>
- Cc: eliz at gnu dot org, dewar at adacore dot com, nickrob at snap dot net dot nz, jan dot kratochvil at redhat dot com, Mathieu dot Lacage at sophia dot inria dot fr, gdb at sourceware dot org
- Date: Wed, 28 Feb 2007 16:43:14 -0800
- Subject: Re: [RFC] Signed/unsigned character arrays are not strings
- References: <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> <20070226004457.GA9926@caradoc.them.org> <17892.4014.160191.285423@kahikatea.snap.net.nz> <45E42969.1030007@adacore.com> <20070227131442.GA20718@caradoc.them.org> <ulkij2tva.fsf@gnu.org> <20070227215316.GA26262@caradoc.them.org> <200702272211.l1RMBVvI028239@brahms.sibelius.xs4all.nl> <20070228134640.GA11261@caradoc.them.org>
Daniel Jacobowitz <drow@false.org> writes:
> On Tue, Feb 27, 2007 at 11:11:32PM +0100, Mark Kettenis wrote:
>> Anyway, I'm in favour of restore the traditional gdb behaviour of
>> printing all three as strings.
>
> While I haven't seen responses to some of my arguments in favor of the
> new behavior, it's obvious that I'm in the minority here. So, the
> behavior ought to change.
>
> I was concerned about Jim's patch, but thinking about it some more, I
> would be happy with it (probably with Paul's suggestion). That's
> because "typedef char *name" is reasonably common, but "typedef char
> name_unit;" is much less so. What do those who prefer the old
> behavior think of that?
For whatever it's worth, here's a revised patch that implements Paul's
suggestions and has better comments (more accurately apologetic):
gdb/ChangeLog:
2007-02-27 Jim Blandy <jimb@codesourcery.com>
* c-valprint.c (textual_element_type): New function.
(c_val_print): Use textual_element_type to decide whether to print
arrays, pointer types, and integer types as strings and characters.
(c_value_print): Doc fix.
Index: gdb/c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.42
diff -u -r1.42 c-valprint.c
--- gdb/c-valprint.c 26 Jan 2007 20:54:16 -0000 1.42
+++ gdb/c-valprint.c 28 Feb 2007 23:09:29 -0000
@@ -56,6 +56,66 @@
}
+/* Apply a heuristic to decide whether an array of TYPE or a pointer
+ to TYPE should be printed as a textual string. Return non-zero if
+ it should, or zero if it should be treated as an array of /pointer
+ to integers.
+
+ It's a shame that we need to use a heuristic here, but C hasn't
+ historically provided distinct types for bytes and characters; you
+ get 'char', which is guaranteed to occupy a byte. You can put
+ qualifiers on it, but people use the qualified char types for text
+ too, so the qualifier's presence doesn't really tell you anything.
+
+ What's especially a shame here is that the heuristic is fixed. The
+ user knows which of their types are textual and which are numeric,
+ but we don't give them a way to tell us, beyond using '/s' every time
+ they print something. */
+static int
+textual_element_type (struct type *type)
+{
+ /* GDB doesn't use TYPE_CODE_CHAR for the C 'char' types; instead,
+ it uses one-byte TYPE_CODE_INT types, with TYPE_NAMEs like
+ "char", "unsigned char", etc. and appropriate flags. For various
+ reasons, this works out well in some places. */
+
+
+ struct type *true_type = check_typedef (type);
+
+ /* TYPE_CODE_CHAR is always textual. But I don't think it ever
+ occurs in C code. */
+ if (TYPE_CODE (true_type) == TYPE_CODE_CHAR)
+ return 1;
+
+ /* If a one-byte TYPE_CODE_INT has a TYPE_NAME of "char" or
+ something ending with " char", then we treat it as text;
+ otherwise, we assume it's being used as data. This makes all our
+ SIMD types like builtin_type_v8_int8 and the <stdint.h> types
+ like uint8_t print numerically, but all 'char' types print
+ textually. Code which says what it means does well.
+ We also recognize types ending in 'char_t'. */
+ if (TYPE_CODE (true_type) == TYPE_CODE_INT
+ && TYPE_LENGTH (true_type) == 1)
+ {
+ int name_len;
+
+ /* All integer types should have names. */
+ gdb_assert (TYPE_NAME (type));
+
+ name_len = strlen (TYPE_NAME (type));
+
+ if (strcmp (TYPE_NAME (type), "char") == 0
+ || (name_len > 5
+ && strcmp (TYPE_NAME (type) + name_len - 5, " char") == 0)
+ || (name_len >= 6
+ && strcmp (TYPE_NAME (type) + name_len - 6, " char_t") == 0))
+ return 1;
+ }
+
+ return 0;
+}
+
+
/* Print data of type TYPE located at VALADDR (within GDB), which came from
the inferior at address ADDRESS, onto stdio stream STREAM according to
FORMAT (a letter or 0 for natural format). The data at VALADDR is in
@@ -94,11 +154,9 @@
{
print_spaces_filtered (2 + 2 * recurse, stream);
}
- /* For an array of chars, print with string syntax. */
- if (eltlen == 1 &&
- ((TYPE_CODE (elttype) == TYPE_CODE_INT && TYPE_NOSIGN (elttype))
- || ((current_language->la_language == language_m2)
- && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+
+ /* Print arrays of textual chars with a string syntax. */
+ if (textual_element_type (TYPE_TARGET_TYPE (type))
&& (format == 0 || format == 's'))
{
/* If requested, look for the first null char and only print
@@ -191,12 +249,11 @@
deprecated_print_address_numeric (addr, 1, stream);
}
- /* For a pointer to char or unsigned char, also print the string
+ /* For a pointer to a textual type, also print the string
pointed to, unless pointer is null. */
/* FIXME: need to handle wchar_t here... */
- if (TYPE_LENGTH (elttype) == 1
- && TYPE_CODE (elttype) == TYPE_CODE_INT
+ if (textual_element_type (TYPE_TARGET_TYPE (type))
&& (format == 0 || format == 's')
&& addr != 0)
{
@@ -398,7 +455,7 @@
Since we don't know whether the value is really intended to
be used as an integer or a character, print the character
equivalent as well. */
- if (TYPE_LENGTH (type) == 1)
+ if (textual_element_type (type))
{
fputs_filtered (" ", stream);
LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
@@ -500,7 +557,9 @@
|| TYPE_CODE (type) == TYPE_CODE_REF)
{
/* Hack: remove (char *) for char strings. Their
- type is indicated by the quoted string anyway. */
+ type is indicated by the quoted string anyway.
+ (Don't use textual_element_type here; quoted strings
+ are always exactly (char *). */
if (TYPE_CODE (type) == TYPE_CODE_PTR
&& TYPE_NAME (type) == NULL
&& TYPE_NAME (TYPE_TARGET_TYPE (type)) != NULL