This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Questions regarding address relaxation on IA-64


Hi,

Recently, I ran into a problem with compiler-linker interaction when analyzing a code generation regression of the new instruction scheduler for GCC that we develop.

It seems that on IA64 addresses of global variables are loaded with two instructions: "addl rXX = r1, <offset>" and "ld8 rXX = [rXX]", with the latter being later changed to "nop" by the linker. This causes the following questions:

* Is the purpose of the "ld8" instruction to load the correct offset if it does not fit into "addl" immediate operand?

* Is it possible to use "movl rXX = <offset>" (move long immediate, in MLX bundle) + "addl rXX = r1, rXX" for the same purpose?

* Is it possible to tell compiler and linker that offsets will be small enough so that only "addl rXX = r1, <offset>" will be needed (and if it is not possbile, why)?

I have noticed that with -mno-pic GCC generates "movl rXX = <address>" (MLX bundle). This causes a couple of questions, too:

* Is it possible to use "mov rXX = <offset-or-address>" (short immediate form) + "ld8 rXX = [rXX]", with ld8 being changed to "nop" by linker if necessary?

* Why is mov+ld8 preferred in PIC code, and movl - in non-PIC code?

The problem is as follows: the benchmark contains a very frequently called function that accesses a number of global variables. For loads of those variables' addresses, GCC generates something like this:
addl r46 = <offset1>, r1
addl r47 = <offset2>, r1
...
addl r56 = <offsetN>, r1
ld8 r46 = [r46]
...
ld8 r56 = [r56]


On Itanium2, 8-byte loads can issue from memory ports 0 and 1 only, so our scheduler places stop bits after each pair of ld8s to avoid stalls due to resource oversubscription. However, the previous scheduler did not care so much, and that brought it a lot of advantage, because all ld8s were changed to nops by linker, and code generated by new scheduler waited unnecessary on extra stop bits.
What can you suggest to solve this problem? Maybe linker should be taught to delete stop bit following a bundle, if it relaxed the bundle so that it consists of nops only, and there is a stop bit preceding this bundle?


Thanks in advance.

Alexander Monakov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]