This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Details on the Kprobes for ARM patch


The kprobes for ARM patch I submitted is using a new variant for
a kprobes port.  The ARM architecture has a limitation previously
unencountered with any kprobes' supporting architecture to date.
ARM has neither a Next-PC register nor a single-stepping state, so
there is no way to regain control of any instruction that modifies
the PC.

To get around the limitation of no Next-PC and no single-stepping,
I split all kprobe'd instructions into three camps: simulation,
emulation, and rejected.  "Simulated" instructions are what I call
instructions that are completely simulated by C code.  These are
ones that modify the PC.  They use no instruction slot.  "Emulated"
instructions are ones that use the instruction slot.  "Rejected"
instructions are ones that I just didn't put the effort into either
simulating or emulating and will fail if attempting to place a
kprobe on them.  These are more work to simulate or emulate and
should be rare if not unencountered in the kernel.  There is no
reason or limitation why they cannot be supported if someone wants
to put in the effort.

The ARM implementation also does not follow the usual kprobes
double exception approach where the initial kprobes breakpoint runs
calling kprobe_handler(), it returns from exception to execute the
instruction in its original context, then immediately re-enter
after a second breakpoint (or single-stepping exception) into
post_kprobe_handler().  This implementation only ever executes one
kprobes exception.  All side-effects from the kprobe'd instruction
are resolved before returning from that initial exception.  As a
result, this code is effectively _always_ boosted regardless of
the instruction and even regardless of whether or not there is a
post-handler.

How I do the above is that at kprobes registration time, I examine
the instruction in question, assign an execution handler to it based
on the type of instruction, and rewriting the instruction stored
in the instruction slot by altering what registers it uses.  If it
is a simulated instruction, no rewriting occurs and its simulation
handler just modifies the "regs" contents directly to resolve the
effects of the instruction.  If it is an emulated instruction, the
handler loads up into the current context just the registers from
"regs" needed for that class of instruction, makes a call to its
slot, and on return the writes back to the "regs" bank just the
registers modified by the instruction in the slot.

This approach may sound like a lot of code, but all the software to
decode, examine, and rewrite instructions, and for all the execution
handlers compile down to less than 4KB of text.

Because this approach isn't split across two exceptions ever, all
the code in arch/arm/kprobes.c to support the second exception
completely goes away.  There is no prepare_single_step(), no
post_kprobe_handler(), and no having to save, modify, and restore
processor flag states across the two exceptions.  I didn't measure
what that savings was, but I'm sure its noticeable and makes that
4KB that much less.

Since there is also no "classic" boosting model, there is no
need for the garbage collector.  It will never be invoked.  It
could be commented out if anyone wants to.

Also, because there is no split exception model and no classic
boosting model, the code is a lot simpler.  The code is MP clean
and there are no ifdefs for MP systems or for preemptible
kernels (CONFIG_PREEMPT).

All the changes are encapsulated within architecture specific areas.
The model requires no changes whatsoever to the generic kprobes
code.

Since ARM is a RISC architecture with a fixed instruction size, it
will be pretty straight forward to continue enhancing this code to
use the djprobe model.  I looked into it a little bit.  As long as
the distance from the kprobe'd instruction to the djprobe handler
is less than +/-32MB address space, it can use a branch instruction
instead of the undef instruction.

Since this is a radical departure from existing architecture-specific
kprobe implementations, and this is the first change I've made to the
Linux kernel contributed back to the community, I hope people will
review this work and give me any feedback at all on any aspect of
this effort.

I'll be giving a talk on this work next week at the CELF Embedded
Linux Conference in Santa Clara scheduled for 3:40pm-4:30pm on April
17th.  http://www.celinux.org/elc2007/

Quentin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]