This is the mail archive of the cgen@sources.redhat.com mailing list for the CGEN project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Improving gas error messages


Frank Ch. Eigler writes:
 > Hi -
 > 
 > 
 > I came across a really amusing behavior of cgen assemblers.  As you may
 > know, when a cgen assembler attempts to parse an instruction line, it tries
 > to do so using all cgen instructions permutations that may possibly
 > match.  It backtracks and goes to the next alternative in case of parse
 > or insertion errors.
 > 
 > This sounds okay, but consider:
 > 
 > - The way that @arch@_cgen_assemble_insn (opcodes) looks up candidate
 >   instructions is by traversing all buckets of a hash table entry.
 >   The first successful parse/insert pair wins.
 > - By default, the hash table bucket is chosen (and initialized) by looking
 >   only at the FIRST CHARACTER of the mnemonic.  This means that all

Picking the first character was just an easy/quick solution.
A better hashing was always left as an exercise for the reader
or the spirit to move me to implement a fancier solution.
[that really wasn't obvious to anyone?]

 >   instructions such as "st", "st4", "suck", "scheisse" will be attempted
 >   in turn for any input line beginning with "s".  (see CGEN_ASM_HASH*.)
 > - There is no good way to distinguish minor parse errors (operand out of
 >   range) from major ones (missing fields, "++" punctuation, etc.).
 > 
 > The end result is that the error messages one may want to see (in case the
 > input is nearly right) often get masked by spurious error messages that
 > come from later parse errors due to backtracking.
 > 
 > Sucks, eh?  So, what to do?

I wasn't trying to build Rome in a day.
Better error messages from the assembler is something
that would certainly be welcome.

 > First, the CGEN_ASM_HASH* routines have to be fixed in cgen (where they are
 > generated) or overridden in the target-specific .opc files.  I include a
 > sample clause below for your ".opc" file that will override the default
 > hasher.  It should nearly eliminate needless alternative parses (though
 > for distinct instructions with shared mnemonics, the problem will persist).
 > 
 > Second, I plan to commit a change to @arch@_cgen_assemble_insn in 
 > `opcodes/cgen-asm.in' that will attempt to bias error messages toward the
 > insertion-time failures over parse-time ones.  The rationale is that if
 > any alternative went so far as to get a correct parse, then an insertion
 > error (typically a ifield value overflow) is more likely what a user wants
 > to see than any old later parse error.  I attach the patch here too.
 > 
 > With these two together, errors messages become a tad more meaningful,
 > and assembling goes a slight bit quicker.

Great.  Go for it.

 > /* -- opc.h */
 > #include <ctype.h>
 > 
 > #define CGEN_VERBOSE_ASSEMBLER_ERRORS
 > 
 > #define CGEN_DIS_HASH_SIZE /* REMEMBER TO OVERRIDE THIS */
 > #define CGEN_DIS_HASH(buf,value) /* AND THIS */
 > 
 > #define CGEN_ASM_HASH_SIZE 127 /* SOME SMALLISH PRIME */
 > #define CGEN_ASM_HASH(insn) YOURPORT_asm_hash(insn)
 > 
 > extern unsigned int YOURPORT_asm_hash (const char *insn);
 > 
 > /* -- opc.c */
 > 
 > /* A better hash function for instruction mnemonics. */
 > unsigned int
 > YOURPORT_asm_hash (insn)
 >      const char* insn;
 > {
 >   unsigned int hash;
 >   const char* m = insn;
 > 
 >   for (hash = 0; *m && !isspace(*m); m++)
 >     hash = (hash << 5) ^ (0x1F && tolower(*m));
 > 
 >   return hash % CGEN_ASM_HASH_SIZE;
 > }
 > 
 > 
 > 
 > 
 > Index: cgen-asm.in
 > ===================================================================
 > --- cgen-asm.in	2000/08/03 14:34:34	1.28
 > +++ cgen-asm.in	2000/12/12 23:41:05
 > @@ -195,17 +195,18 @@
 >       CGEN_CPU_DESC cd;
 >       const char *str;
 >       CGEN_FIELDS *fields;
 >       CGEN_INSN_BYTES_PTR buf;
 >       char **errmsg;
 >  {
 >    const char *start;
 >    CGEN_INSN_LIST *ilist;
 > -  const char *tmp_errmsg = NULL;
 > +  const char *parse_errmsg = NULL;
 > +  const char *insert_errmsg = NULL;
 >  
 >    /* Skip leading white space.  */
 >    while (isspace (* str))
 >      ++ str;
 >  
 >    /* The instructions are stored in hashed lists.
 >       Get the first in the list.  */
 >    ilist = CGEN_ASM_LOOKUP_INSN (cd, str);
 > @@ -230,40 +231,42 @@
 >        if (CGEN_INSN_ATTR_VALUE (insn, CGEN_INSN_RELAX) != 0)
 >  	continue;
 >  
 >        str = start;
 >  
 >        /* Allow parse/insert handlers to obtain length of insn.  */
 >        CGEN_FIELDS_BITSIZE (fields) = CGEN_INSN_BITSIZE (insn);
 >  
 > -      tmp_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
 > -      if (tmp_errmsg != NULL)
 > +      parse_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
 > +      if (parse_errmsg != NULL)
 >  	continue;
 >  
 >        /* ??? 0 is passed for `pc' */
 > -      tmp_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
 > -					      (bfd_vma) 0);
 > -      if (tmp_errmsg != NULL)
 > +      insert_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
 > +						 (bfd_vma) 0);
 > +      if (insert_errmsg != NULL)
 >          continue;
 >  
 >        /* It is up to the caller to actually output the insn and any
 >           queued relocs.  */
 >        return insn;
 >      }
 >  
 > -  /* Make sure we leave this with something at this point. */
 > -  if (tmp_errmsg == NULL)
 > -    tmp_errmsg = "unknown mnemonic";
 > -
 >    {
 >      static char errbuf[150];
 > +    const char *tmp_errmsg;
 >  
 >  #ifdef CGEN_VERBOSE_ASSEMBLER_ERRORS
 > -    /* if verbose error messages, use errmsg from CGEN_PARSE_FN */
 > +    /* If requesting verbose error messages, use insert_errmsg.
 > +       Failing that, use parse_errmsg */
 > +    tmp_errmsg = (insert_errmsg ? insert_errmsg :
 > +		  parse_errmsg ? parse_errmsg :
 > +		  "unrecognized instruction");
 > +
 >      if (strlen (start) > 50)
 >        /* xgettext:c-format */
 >        sprintf (errbuf, "%s `%.50s...'", tmp_errmsg, start);
 >      else 
 >        /* xgettext:c-format */
 >        sprintf (errbuf, "%s `%.50s'", tmp_errmsg, start);
 >  #else
 >      if (strlen (start) > 50)
 > 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]