This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] .bundle_align_mode


On Wed, Feb 15, 2012 at 2:18 AM, nick clifton <nickc@redhat.com> wrote:
> * It might be useful to mention in the documentation why this feature is
> needed.

Oh, in the documentation?  I was all set to explain it to you here.  But
for the documentation, I think it's appropriate to say not a whole lot more
than exactly how the assembler behaves.  (The GAS manual is not the place
to describe the details of target ABIs.)  The short version is, "For some
targets, it's an ABI requirement that no instruction may span a certain
aligned boundary."  Is that what you had in mind?

To explain it here, indeed the short version is that it's a target ABI
requirement that no instruction may span a bundle boundary.

The background is that the system disassembles the code at load time and
validates it against a set of ABI constraints.  The constraints include
that only a certain subset of instructions is valid.  On machines with
variable-sized instructions, this is nontrivial in the face of computed
jumps.  To ensure that only recognized instruction boundaries can be jump
targets, the system enforces that a computed jump must target an aligned
address.  Hence the rule that instructions may not span a boundary (so you
can never jump into the middle of what the validator thought was an
instruction).

The system also enforces that certain instructions may appear only as part
of a constrained multi-instruction sequence.  For example, to enforce the
aforementioned constraint on jumps, a computed jump on x86 must look like:
	and $-32, %reg
	jmp *%reg
That constraint would be defeated if you could jump to the second
instruction of the sequence.  Hence the bundle-lock feature is required to
ensure that such multi-instruction sequences stick together and don't span
bundle boundaries.  (I also intend to use it for prescribed cases such as
the TLS sequences, where the linker expects to see exact known instruction
sequences and would be thrown off by implicit nops inserted in the middle.)

> * What about instructions with delay slots ?

That's a good question.  I hadn't thought about it, since so far I've only
considered CPUs that don't have delay slots.  The only machine with delay
slots that I know anything about is sparc, so if my logic doesn't hold for
some other machine you know about, you'll have to educate me.

On machines like sparc, where all instructions are the same size, I don't
think there is any problem.  If an instruction with a delay slot is the
last instruction in a bundle, then there won't be any padding and so the
delay slot will be the first instruction of the next bundle.  If instead
it's not the last instruction in the bundle, then there is always room for
the delay slot inside the bundle.  In no case would a nop be inserted
between an instruction and its delay slot.  It doesn't actually matter to
the ABI constraints whether an instruction and its delay slot fall in the
same bundle or not.  (That is, except if it's a particular case where the
ABI says the latter instruction can only be used as part of a prescribed
multi-instruction sequence, in which case you'd use .bundle_lock for the
instruction and its delay slot just like any other multi-instruction
sequence.)

If there is a machine that has delay slots and variable-sized instructions,
then it becomes an issue.  It wouldn't matter to the ABI constraints
whether an instruction in a delay slot got pushed into the next bundle.
But it would matter to correctness, since a nop in the delay slot wouldn't
do the right thing if the first instruction were a branch.  If there is
such a machine, then it might make sense to automagically bundle-lock a
delay-slot-using instruction with its following instruction, so the code
wouldn't have to use .bundle_lock explicitly.  But I don't feel the need to
worry about that before we actually define such an ABI for such a machine.

> * Would it be considered an error, or just unusual, if the
> .bundle_align_mode directive was used inside a non-executable section ?

I think I'd call it just unusual.  It doesn't seem worth trying to make it
an error.  For the purpose of the target ABIs in question, it only matters
for instructions, not miscellaneous data.  But one could imagine using
assembled instructions in a non-executable section to produce a code blob
that will be copied somewhere to be made executable later.  In that case,
you could well want the same features for your instructions in a
non-executable section.

>> +@section @code{.bundle_lock} and @code{.bundle_unlock}
>
> * Can these directives be nested ?

As I've specified it, no.  I can't really see why it would be useful.  The
inner directive wouldn't mean anything, since the whole sequence inside the
outermost pair has to go into a single bundle.  I guess I could imagine
writing some macros where it becomes relevant:

	.macro foo
	  .bundle_lock
	    insn 1
	    insn 2
	  .bundle_unlock
	.endm
	.macro bar
	  .bundle_lock
	  foo
	  insn 3
	  .bundle_unlock
	.endm

But for the actual ABIs in question I can't actually imagine a case like
that being useful.  If it were, it would be trivial to support, since the
inner ones have no effect but to increment/decrement a nesting count so as
to notice when you've hit the outermost .bundle_unlock.  I can add that if
the need ever arises.  For now, I've added a couple of sentences to the
documentation to make it explicit that nesting is not allowed.

> * Presumably once a bundle has been created by the assembler, it needs to be
> preserved by the linker.  Thus it seems to me that you are going to need a
> reloc or two to tell the linker about the bundle.

I don't think there's any need for this, but perhaps you have something in
mind that I don't know about.  Even if the linker does some instruction
rewriting, I wouldn't expect it to move instructions around.  It always
just rewrites a sequence to one of the same length, or a shorter one and
then pads it with nops, doesn't it?  If it moved things around, then there
would need to be relocs for every local jump (not to mention more obscure
cases), which there aren't.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]