This is the mail archive of the cgen@sources.redhat.com mailing list for the CGEN project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Improving gas error messages

To: cgen at sources dot redhat dot com
Subject: Improving gas error messages
From: "Frank Ch. Eigler" <fche at redhat dot com>
Date: Tue, 12 Dec 2000 18:49:21 -0500

Hi -


I came across a really amusing behavior of cgen assemblers.  As you may
know, when a cgen assembler attempts to parse an instruction line, it tries
to do so using all cgen instructions permutations that may possibly
match.  It backtracks and goes to the next alternative in case of parse
or insertion errors.

This sounds okay, but consider:

- The way that @arch@_cgen_assemble_insn (opcodes) looks up candidate
  instructions is by traversing all buckets of a hash table entry.
  The first successful parse/insert pair wins.
- By default, the hash table bucket is chosen (and initialized) by looking
  only at the FIRST CHARACTER of the mnemonic.  This means that all
  instructions such as "st", "st4", "suck", "scheisse" will be attempted
  in turn for any input line beginning with "s".  (see CGEN_ASM_HASH*.)
- There is no good way to distinguish minor parse errors (operand out of
  range) from major ones (missing fields, "++" punctuation, etc.).

The end result is that the error messages one may want to see (in case the
input is nearly right) often get masked by spurious error messages that
come from later parse errors due to backtracking.

Sucks, eh?  So, what to do?

First, the CGEN_ASM_HASH* routines have to be fixed in cgen (where they are
generated) or overridden in the target-specific .opc files.  I include a
sample clause below for your ".opc" file that will override the default
hasher.  It should nearly eliminate needless alternative parses (though
for distinct instructions with shared mnemonics, the problem will persist).

Second, I plan to commit a change to @arch@_cgen_assemble_insn in 
`opcodes/cgen-asm.in' that will attempt to bias error messages toward the
insertion-time failures over parse-time ones.  The rationale is that if
any alternative went so far as to get a correct parse, then an insertion
error (typically a ifield value overflow) is more likely what a user wants
to see than any old later parse error.  I attach the patch here too.

With these two together, errors messages become a tad more meaningful,
and assembling goes a slight bit quicker.

- FChE


/* -- opc.h */
#include <ctype.h>

#define CGEN_VERBOSE_ASSEMBLER_ERRORS

#define CGEN_DIS_HASH_SIZE /* REMEMBER TO OVERRIDE THIS */
#define CGEN_DIS_HASH(buf,value) /* AND THIS */

#define CGEN_ASM_HASH_SIZE 127 /* SOME SMALLISH PRIME */
#define CGEN_ASM_HASH(insn) YOURPORT_asm_hash(insn)

extern unsigned int YOURPORT_asm_hash (const char *insn);

/* -- opc.c */

/* A better hash function for instruction mnemonics. */
unsigned int
YOURPORT_asm_hash (insn)
     const char* insn;
{
  unsigned int hash;
  const char* m = insn;

  for (hash = 0; *m && !isspace(*m); m++)
    hash = (hash << 5) ^ (0x1F && tolower(*m));

  return hash % CGEN_ASM_HASH_SIZE;
}




Index: cgen-asm.in
===================================================================
--- cgen-asm.in	2000/08/03 14:34:34	1.28
+++ cgen-asm.in	2000/12/12 23:41:05
@@ -195,17 +195,18 @@
      CGEN_CPU_DESC cd;
      const char *str;
      CGEN_FIELDS *fields;
      CGEN_INSN_BYTES_PTR buf;
      char **errmsg;
 {
   const char *start;
   CGEN_INSN_LIST *ilist;
-  const char *tmp_errmsg = NULL;
+  const char *parse_errmsg = NULL;
+  const char *insert_errmsg = NULL;
 
   /* Skip leading white space.  */
   while (isspace (* str))
     ++ str;
 
   /* The instructions are stored in hashed lists.
      Get the first in the list.  */
   ilist = CGEN_ASM_LOOKUP_INSN (cd, str);
@@ -230,40 +231,42 @@
       if (CGEN_INSN_ATTR_VALUE (insn, CGEN_INSN_RELAX) != 0)
 	continue;
 
       str = start;
 
       /* Allow parse/insert handlers to obtain length of insn.  */
       CGEN_FIELDS_BITSIZE (fields) = CGEN_INSN_BITSIZE (insn);
 
-      tmp_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
-      if (tmp_errmsg != NULL)
+      parse_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
+      if (parse_errmsg != NULL)
 	continue;
 
       /* ??? 0 is passed for `pc' */
-      tmp_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
-					      (bfd_vma) 0);
-      if (tmp_errmsg != NULL)
+      insert_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
+						 (bfd_vma) 0);
+      if (insert_errmsg != NULL)
         continue;
 
       /* It is up to the caller to actually output the insn and any
          queued relocs.  */
       return insn;
     }
 
-  /* Make sure we leave this with something at this point. */
-  if (tmp_errmsg == NULL)
-    tmp_errmsg = "unknown mnemonic";
-
   {
     static char errbuf[150];
+    const char *tmp_errmsg;
 
 #ifdef CGEN_VERBOSE_ASSEMBLER_ERRORS
-    /* if verbose error messages, use errmsg from CGEN_PARSE_FN */
+    /* If requesting verbose error messages, use insert_errmsg.
+       Failing that, use parse_errmsg */
+    tmp_errmsg = (insert_errmsg ? insert_errmsg :
+		  parse_errmsg ? parse_errmsg :
+		  "unrecognized instruction");
+
     if (strlen (start) > 50)
       /* xgettext:c-format */
       sprintf (errbuf, "%s `%.50s...'", tmp_errmsg, start);
     else 
       /* xgettext:c-format */
       sprintf (errbuf, "%s `%.50s'", tmp_errmsg, start);
 #else
     if (strlen (start) > 50)

PGP signature

Follow-Ups:
- Improving gas error messages
  - From: Doug Evans

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]