This is the mail archive of the
cgen@sources.redhat.com
mailing list for the CGEN project.
generalizing the delay rtx function
- To: cgen at sources dot redhat dot com
- Subject: generalizing the delay rtx function
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- Date: Thu, 8 Mar 2001 16:01:06 -0500
Hi -
As you may be aware, the delay rtx function is used, despite its
work-in-progress designation, to model delayed branches in
constructs like
(delay 1
(set pc (add pc 42)))
The DELAY-SLOT insn attribute is inferred from this for use by
simulator mainlines. That's the extent of the effect of the delay
rtx.
In order to model architectures with exposed pipelines (i.e., no
or limited pipeline interlocks), and related effects like delayed
loads, I'd like to take it beyond this, by coupling it to the
parallel-write mechanism.
As you might be aware, ports that have VLIW features tend to use
the "parallel-write" mechanism in their semantic blocks in order
to queue updates to registers/memory? until after all concurrently
executed instructions have been processed. This lets multiple
reader instructions execute together with a writer instruction,
without detailed worry about the evaluation sequence.
Anyway, how about a scheme such as this:
- Provide a clear definition for the DELAY rtx:
The numeric argument is the number of instruction cycles
after the current one, at which the enclosed set expressions
take effect.
- Restrict the use of the DELAY rtx to only contain SET expressions
to hardware/memory registers. Forbid other calculations.
- Possibly, force use of (DELAY 0 ....) to express VLIW concurrency,
at least in new ports.
- Infer "parallel-write?" (or a new equivalent) from the presence of
DELAY rtxs.
- Eliminate special treatment of PC by fitting delayed branches into
this model.
Then, the generated simulator code would be changed, so that:
- Semantic functions, instead of taking a single parexec structure
pointer (for write queueing), take an array of them. Within
(DELAY <N> RTX*) blocks, define OPRND to point to the appropriate
elements in the parexec array.
- The insn evaluation loop would keep an array of parexec structs
as a rotating buffer, always running the writeback code on the first
one, then rotating the set, then passing it to the next insn. cgen
could compute the maximum index needed.
This way, code like
(set reg1 1)
(delay 0 (set reg1 3))
(delay 1 (set reg2 5))
(delay 2 (set reg1 6))
would each be well-defined and useful.
An alternate cgen syntax possibility is to introduce a
(delayed-set N lvalue rvalue)
rtx.
Any advice?
- FChE
PGP signature