This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: x86_64 kprobes wart removal
- From: William Cohen <wcohen at redhat dot com>
- To: Jim Keniston <jkenisto at us dot ibm dot com>
- Cc: SystemTAP <systemtap at sources dot redhat dot com>
- Date: Fri, 08 Apr 2005 11:20:03 -0400
- Subject: Re: x86_64 kprobes wart removal
- References: <1112911296.2231.53.camel@dyn9047018078.beaverton.ibm.com>
Jim Keniston wrote:
This email is for x86_64 kprobes wonks. Remember get_insn_slot() and
free_insn_slot()? These functions are a constant headache because they
can sleep. That's because get_insn_slot() occasionally has to allocate
a readable, writable, executable page to hold the instruction-copy for a
new kprobe. That's because x86_64 won't single-step (or otherwise
execute) an instruction on a page that isn't mapped executable.
I propose the following alternative:
- Allocate one executable page at the beginning of time. [See note 1.]
- Store the instruction copy in the kprobe object, as in other
arhcitectures.
- When it comes time to single-step an instruction, just copy the
instruction from the kprobe object to the executable page.
- In resume_execution, adjust copy_rip accordingly.
Copying the instruction just before the single step could be expensive,
looking more like self-modifying code.
Note 1: If we go to per-CPU locking, we may need to allocate enough
space for NR_CPUS instructions. Also, we still want to use Roland's
trick of allocating the memory close to where the modules live.
Wouldn't the allocations need to be large enough fill a cache line to
avoid false sharing and cache lines getting bounced between processors?
Cache lines are significantly larger than the 15 bytes or so for the
largest x86-64 instruction.
I don't have a patch yet, but does that sound like the right approach?
I wish I'd thought of this a year ago. :-}
It sounds like this approach might be slower and consume more memory.
-Will