This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
Improving gas error messages
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Subject: Improving gas error messages
- From: Doug Evans <dje at transmeta dot com>
- Date: Tue, 12 Dec 2000 16:02:26 -0800 (PST)
- Cc: cgen at sources dot redhat dot com
- References: <20001212184921.G18334@redhat.com>
Frank Ch. Eigler writes:
> Hi -
>
>
> I came across a really amusing behavior of cgen assemblers. As you may
> know, when a cgen assembler attempts to parse an instruction line, it tries
> to do so using all cgen instructions permutations that may possibly
> match. It backtracks and goes to the next alternative in case of parse
> or insertion errors.
>
> This sounds okay, but consider:
>
> - The way that @arch@_cgen_assemble_insn (opcodes) looks up candidate
> instructions is by traversing all buckets of a hash table entry.
> The first successful parse/insert pair wins.
> - By default, the hash table bucket is chosen (and initialized) by looking
> only at the FIRST CHARACTER of the mnemonic. This means that all
Picking the first character was just an easy/quick solution.
A better hashing was always left as an exercise for the reader
or the spirit to move me to implement a fancier solution.
[that really wasn't obvious to anyone?]
> instructions such as "st", "st4", "suck", "scheisse" will be attempted
> in turn for any input line beginning with "s". (see CGEN_ASM_HASH*.)
> - There is no good way to distinguish minor parse errors (operand out of
> range) from major ones (missing fields, "++" punctuation, etc.).
>
> The end result is that the error messages one may want to see (in case the
> input is nearly right) often get masked by spurious error messages that
> come from later parse errors due to backtracking.
>
> Sucks, eh? So, what to do?
I wasn't trying to build Rome in a day.
Better error messages from the assembler is something
that would certainly be welcome.
> First, the CGEN_ASM_HASH* routines have to be fixed in cgen (where they are
> generated) or overridden in the target-specific .opc files. I include a
> sample clause below for your ".opc" file that will override the default
> hasher. It should nearly eliminate needless alternative parses (though
> for distinct instructions with shared mnemonics, the problem will persist).
>
> Second, I plan to commit a change to @arch@_cgen_assemble_insn in
> `opcodes/cgen-asm.in' that will attempt to bias error messages toward the
> insertion-time failures over parse-time ones. The rationale is that if
> any alternative went so far as to get a correct parse, then an insertion
> error (typically a ifield value overflow) is more likely what a user wants
> to see than any old later parse error. I attach the patch here too.
>
> With these two together, errors messages become a tad more meaningful,
> and assembling goes a slight bit quicker.
Great. Go for it.
> /* -- opc.h */
> #include <ctype.h>
>
> #define CGEN_VERBOSE_ASSEMBLER_ERRORS
>
> #define CGEN_DIS_HASH_SIZE /* REMEMBER TO OVERRIDE THIS */
> #define CGEN_DIS_HASH(buf,value) /* AND THIS */
>
> #define CGEN_ASM_HASH_SIZE 127 /* SOME SMALLISH PRIME */
> #define CGEN_ASM_HASH(insn) YOURPORT_asm_hash(insn)
>
> extern unsigned int YOURPORT_asm_hash (const char *insn);
>
> /* -- opc.c */
>
> /* A better hash function for instruction mnemonics. */
> unsigned int
> YOURPORT_asm_hash (insn)
> const char* insn;
> {
> unsigned int hash;
> const char* m = insn;
>
> for (hash = 0; *m && !isspace(*m); m++)
> hash = (hash << 5) ^ (0x1F && tolower(*m));
>
> return hash % CGEN_ASM_HASH_SIZE;
> }
>
>
>
>
> Index: cgen-asm.in
> ===================================================================
> --- cgen-asm.in 2000/08/03 14:34:34 1.28
> +++ cgen-asm.in 2000/12/12 23:41:05
> @@ -195,17 +195,18 @@
> CGEN_CPU_DESC cd;
> const char *str;
> CGEN_FIELDS *fields;
> CGEN_INSN_BYTES_PTR buf;
> char **errmsg;
> {
> const char *start;
> CGEN_INSN_LIST *ilist;
> - const char *tmp_errmsg = NULL;
> + const char *parse_errmsg = NULL;
> + const char *insert_errmsg = NULL;
>
> /* Skip leading white space. */
> while (isspace (* str))
> ++ str;
>
> /* The instructions are stored in hashed lists.
> Get the first in the list. */
> ilist = CGEN_ASM_LOOKUP_INSN (cd, str);
> @@ -230,40 +231,42 @@
> if (CGEN_INSN_ATTR_VALUE (insn, CGEN_INSN_RELAX) != 0)
> continue;
>
> str = start;
>
> /* Allow parse/insert handlers to obtain length of insn. */
> CGEN_FIELDS_BITSIZE (fields) = CGEN_INSN_BITSIZE (insn);
>
> - tmp_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
> - if (tmp_errmsg != NULL)
> + parse_errmsg = CGEN_PARSE_FN (cd, insn) (cd, insn, & str, fields);
> + if (parse_errmsg != NULL)
> continue;
>
> /* ??? 0 is passed for `pc' */
> - tmp_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
> - (bfd_vma) 0);
> - if (tmp_errmsg != NULL)
> + insert_errmsg = CGEN_INSERT_FN (cd, insn) (cd, insn, fields, buf,
> + (bfd_vma) 0);
> + if (insert_errmsg != NULL)
> continue;
>
> /* It is up to the caller to actually output the insn and any
> queued relocs. */
> return insn;
> }
>
> - /* Make sure we leave this with something at this point. */
> - if (tmp_errmsg == NULL)
> - tmp_errmsg = "unknown mnemonic";
> -
> {
> static char errbuf[150];
> + const char *tmp_errmsg;
>
> #ifdef CGEN_VERBOSE_ASSEMBLER_ERRORS
> - /* if verbose error messages, use errmsg from CGEN_PARSE_FN */
> + /* If requesting verbose error messages, use insert_errmsg.
> + Failing that, use parse_errmsg */
> + tmp_errmsg = (insert_errmsg ? insert_errmsg :
> + parse_errmsg ? parse_errmsg :
> + "unrecognized instruction");
> +
> if (strlen (start) > 50)
> /* xgettext:c-format */
> sprintf (errbuf, "%s `%.50s...'", tmp_errmsg, start);
> else
> /* xgettext:c-format */
> sprintf (errbuf, "%s `%.50s'", tmp_errmsg, start);
> #else
> if (strlen (start) > 50)
>