This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
RE: [RFC] PR 15873 UTF-8 incomplete/invalid chars go unnoticed
- From: "Pierre Muller" <pierre dot muller at ics-cnrs dot unistra dot fr>
- To: <gdb-patches at sourceware dot org>
- Date: Sat, 14 Sep 2013 00:25:49 +0200
- Subject: RE: [RFC] PR 15873 UTF-8 incomplete/invalid chars go unnoticed
- Authentication-results: sourceware.org; auth=none
- References: <000f01ce9e7e$6bd8a0b0$4389e210$ at muller@ics-cnrs.unistra.fr>
Ping?
Nobody reacted to this email yet...
Pierre Muller
> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Pierre Muller
> Envoyé : mercredi 21 août 2013 16:55
> À : gdb-patches@sourceware.org
> Objet : [RFC] PR 15873 UTF-8 incoplete/invalid chars go unnoticed
>
> Not all binary values from 0 to 255
> are valid UTF-8 chars:
>
> http://fr.wikipedia.org/wiki/UTF-8
>
> Seems to imply that any value between 128 and 255
> cannot be a valid UTF-8 character....
>
> Nonetheless,
> testsuite/gdb.base/printcmds.exp
> seems to rely on the fact that those values are just displayed as octals
> in the print ctable1[XX] tests.
>
> This test was failing a lot for mingw built GDB,
> and while trying to understand why this was not the case on linux,
> I noticed that the test was only completing successfully because
> UTF-8 is the default target-charset on the linux system I tested.
>
> The test itself seems to rely on the
> set sevenbit-strings
> command to ensure that all chars in 128-255 interval
> are displayed as octals...
> But the variable sevenbit_strings
> is not handled at all in generic_emit_char,
> which is called by the 'print ctable1[XX]' commands above.
>
> The patch below adds a <invalid>/<incomplete> marker
> to 1-byte chars that are not valid in UTF-8.
>
> This means that this patch will create regressions
> in testsuite runs, but I think that it's the
> test that is wrong, not my patch.
>
> Comments most welcomed,
>
> Pierre Muller
> GDB pascal language maintainer
>
>
> 2013-08-21 Pierre Muller <muller@sourceware.org>
>
> * valprint.c (generic_emit_char): Handle RESULT value
> and display information if problem occured inside wchar_iterate
> call.
>
> Index: src/gdb/valprint.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/valprint.c,v
> retrieving revision 1.138
> diff -u -p -r1.138 valprint.c
> --- src/gdb/valprint.c 17 Jul 2013 20:35:11 -0000 1.138
> +++ src/gdb/valprint.c 21 Aug 2013 14:38:49 -0000
> @@ -2012,6 +2012,7 @@ generic_emit_char (int c, struct type *t
> struct cleanup *cleanups;
> gdb_byte *buf;
> struct wchar_iterator *iter;
> + char *info = NULL;
> int need_escape = 0;
>
> buf = alloca (TYPE_LENGTH (type));
> @@ -2035,6 +2036,23 @@ generic_emit_char (int c, struct type *t
> enum wchar_iterate_result result;
>
> num_chars = wchar_iterate (iter, &result, &chars, &buf, &buflen);
> + switch (result)
> + {
> + case wchar_iterate_ok:
> + /* Do not change it if it has been set before. */
> + break;
> + case wchar_iterate_invalid:
> + info = "<invalid>";
> + break;
> + case wchar_iterate_incomplete:
> + info = "<incomplete>";
> + break;
> + case wchar_iterate_eof:
> + /* info = "<eof>"; This is expected as last call. */
> + break;
> + default:
> + info = "<inconsistent>";
> + }
> if (num_chars < 0)
> break;
> if (num_chars > 0)
> @@ -2081,6 +2099,9 @@ generic_emit_char (int c, struct type *t
>
> fputs_filtered (obstack_base (&output), stream);
>
> + if (info)
> + fputs_filtered (info, stream);
> +
> do_cleanups (cleanups);
> }
>