This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Scheduling x86 dispatch windows


On 06/10/10 13:52, H.J. Lu wrote:
On Thu, Jun 10, 2010 at 11:05 AM, Quentin Neill
<quentin.neill.gnu@gmail.com> wrote:
Cross-posting Reza's call for feedback to the binutils list since it
is relevant -
see the last few paragraphs regarding how to "solve the alignment problem".

Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402

Not sure if followups should occur on one list or both.
--
Quentin Neill


On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani<yazdani_reza@yahoo.com> wrote:
Hi,

We are in the process of adding a feature to GCC to take advantage
of a new hardware feature in the latest AMD micro processor. This
feature requires a certain mix, ordering and alignments in
instruction sequences to obtain the expected hardware performance.

I am asking the community to review this high level implementation
design and give me direction or advice.

The new hardware issues two windows of the size N bytes of
instructions in every cycle. It goes into accelerate mode if the
windows have the right combination of instructions or alignments. Our
goal is to maximize the IPC by proper instruction scheduling and
alignments.

Here is a summary of the most important requirements:

a) Maximum of N instructions per window.
b) An instruction may cross the first window.
c) Each window can have maximum of x memory loads and y memory
    stores .
d) The total number of immediate constants in the instructions
    of a window should not exceed k.
e) The first window must be aligned on 16 byte boundary.
f) A Window set terminates when a branch exists in a window.
g) The number of allowed prefixes varies for instructions.
h) A window set needs to be padded by prefixes in instructions
    or terminated by nops to ensure adherence to the rules.

We have the following implementation plan for GCC:

1) Modify the Haifa scheduler to make the desired arrangement of
    instructions for the two dispatch windows. The scheduler is called
    once before and once after register allocation as usual. In both
    cases it performs dispatch scheduling along with its normal job of
    instruction scheduling.

The advantage of doing it before register allocation is avoiding
extra dependencies caused by register allocation which may become
an obstacle to movement of instructions.  The advantage of doing
it after register allocation is a consideration for spilling code
which may be generated by the register allocator.

The algorithm we use is:

a) Considering the current dispatch window set, choose the first
    instruction from ready queue that does not violate dispatch rules.
b) When an instruction is selected and scheduled, inform the
    dispatcher code about the instruction. This step keeps track of the
    instruction content of windows for future evaluation. It also manages
    the window set by closing and opening new virtual dispatch windows.

2) Insertion of alignment code.

In x86 alignment is done by inserting prefixes or by generating
nops. As the object code is generated by the assembler in GCC, some
information such as sizes of branches are unknown until assembly or
link time. To do alignments related to dispatch correctly in GCC,
we need to iteratively compute prefixes and branch sizes until
its convergence. This pass currently does not exist in GCC, but it
exists in the assembler.

There are two possible approaches to solve alignment problem.

a)  Let the assembler performs the alignments and padding needed
     to adhere with the new machine dispatching rules and avoid an extra
     pass in GCC.
b)  Add a new pass to mimic what assembler does before generating
     the assembly listing in GCC and insert the required alignments.

I appreciate your comments on the proposed implementation procedure
and the choices a or b above.
I don't this should be done in assembler. Assembler should just assemble
the assembly input.
That adds quite a bit of complication to the compiler though -- getting the instruction lengths right (and thus proper packing & alignment) can be extremely difficult. I did some experiments with this on a target with *fixed* instruction lengths a while back and even though the port tried hard to get lengths right, it would routinely miss something. Ultimately I decided that it forcing the compiler to know instruction lengths with a very high degree of accuracy wasn't a sane thing to do. Dealing with variable instruction lengths just adds yet another complexity to the situation. Then add the complication of needing to add specific prefixes or nops and it just gets downright ugly.

I'd probably approach this by having the compiler emit a directive which states what the desired alignment at a particular point should be, then allow the assembler to select the best method to get the desired alignment.

jeff




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]