This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]-Approaches to user space probes


We should think that we need to support 4G/4G split.

In the 4G/4G split environment, kernel space portion
cannot touch user space portion. So instrumentation
code should be placed in user space. Or we need to
add the way to aquire the memory that is place in
kernel/user common space.

On the other hand, aquiring instrumentation code area
from user space is difficult sometimes.  Applications
may use up user space.

By the way, I want to introduce Pannus project.

This is similar to or exactly the same as #2.

This is a live-patching to application. Pannus try to
remove target application out of schedule temporarily.
And it try to find an available user space using patched
do_mmap_pgoff(). It loads instrumentation code into
user space, if finding space. And change original code
to swich to patching code. it pins down the changed page.

I haven't tested this patch. So this way might have some
problem. But this approach is worth to discuss, I think.


Prasanna S Panchamukhi wrote:
> Hi,
> As per yesturday's Conf call dicussion, I have listed few approaches
> for dynamic instrumentation of applications/libraries. Please provide
> your suggestions about the listed approaches and other approaches you
> know.
> Thanks
> Prasanna
> 	1. Attaching or loading the application into the tool.
> 	2. Using a jump instruction to a trampoline and trampoline
> 	   executing the instrumented code.
> 	3. Using a breakpoint instruction and changing the instruction
> 	   point to the instrumentaiton code which is part of user
> 	   address space.
> 	4. Using a breakpoint instruction and executing the
> 	   instrumentation code within the breakpoint handler.
> 1. Attaching or loading the application into the tool.
> 	In this method the user application must be loaded into the
> tool or attached to already running application. Before the user can
> instrument an application he must decide what that instrumentation
> will consist of. Dynaprof uses such a mechanism. There are currently
> two probes shipped with Dynaprof, the PAPI Probe and the Wallclock
> Probe.  PAPI uses the processor's hardware performance counters to
> measure specific hardware events like cache misses, branch
> mispredictions and floating point instructions. The Wallclock probe
> measures elapsed real-time which is sometimes referred to as wallclock
> time.
> Dynaprof inserts instrumentation directly into the applications
> address space. This is accomplished through a run-time code generation
> and patching mechanism based upon either Dyninst or DPCL, IBM's
> derivative effort. Whenever a function is instrumented, all it's
> children are instrumented as well. This is to enable the probe to
> generate both inclusive and exclusive metrics.
> 2. Using a jump instruction to a trampoline and trampoline executing
> the instrumented code.
> 	In this method the instrumenation code must be loded into user
> address space dynamically. The major challenges are to generate
> instrumentation code at the run time and to allocate space for
> dynamically generated code. To insert this code, the application
> process is stopped, the code and data are installed into the
> application address space using operating system facilities such as
> ptrace and /proc file system. Each small code fragments are called
> trampolines. Associated with each active probe is a base-trampoline
> and block of instrumentation code is placed in its own mini-trampoline.
> The base trampoline contains the relocated original instructions from
> the probe point in the application program, instructions to save and
> restore registers, slots where jumps to mini-trampolines are be inserted
> and a jump to return to the application code. When the probe is fired,
> the base-trampoline gets executed that saves the registers state and
> then execute individual mini-trampolines After returning, base
> trampoline restores the registers state and normal execution continues.
> Eg: Paradyn tool.
> Issues with method 1 and 2 are:
> 	* Induces intel erratum E49 where the other processors see
> 	  stale data while one processor replaces the jump instruction.
> 	* Instruction can only be replaced atomically if the size of
> 	  the jump instruction is greater than or equal to the original
> 	  instruction.
> 	* Other processors need to be stopped if the jump instruction size
> 	  is less than the original instruction.
> 3. Using breakpoint instruction and changing the instruction pointer
> In this method a breakpoint instruction is inserted at the probe point
> and the original instruction is copied into the user address space.
> When the probe is fired, the breakpoint handler changes the instruction
> pointer to jump to a trampoline part of user address space. After the
> trampoline executes the instrumenation code, trampoline jumps back to
> the original routine after restoring the registers and process stack.
> Issue associated with this approach is to allocate a saperate space in
> user address space to copy the instrumenation code and original
> instruction.
> 4. Using breakpoint instruction
> 	Using a breakpoint instruction and executing the instrumentation
> code from within the breakpoint handler in the interrupt context.
> Issue associated with this approach is to single step the original
> instruction out-of-line.
> In kernel space probes, single stepping out-of-line is achieved by
> copying the instruction on to some location within kernel address space
> and then single step from that location. But for userspace probes,
> instruction copied into kernel address space cannot be single stepped,
> hence the instruction should be copied to user address space. The
> solution is to find free space in the current process address space
> and then copy the original instruction and single step that instruction.
> User processes use stack space to store local variables, agruments and
> return values. Normally the stack space either below or above the stack
> pointer indicates the free stack space. If the stack grows downwards,
> the stack space below the stack pointer indicates the unused stack free
> space and if the stack grows upwards, the stack space above the stack
> pointer indicates the unused stack free space.
> The instruction to be single stepped can modify the stack space,
> hence before using the unused stack free space, sufficient stack space
> should be left. The instruction is copied to the bottom of the page
> and check is made such that the copied instruction does not cross the
> page boundry. The copied instruction is then single stepped.
> Several architectures does not allow the instruction to be executed
> from the stack location, since no-exec bit is set for the stack pages.
> In those architectures, the page table entry corresponding to the
> stack page is identified and the no-exec bit is unset making the
> instruction on that stack page to be executed.
> There are situations where even the unused free stack space is not
> enough for the user instruction to be copied and single stepped. In
> such situations, the virtual memory area(vma) can be expanded beyond
> the current stack vma. This expaneded stack can be used to copy the
> original instruction and single step out-of-line.
> Even if the vma cannot be extended then the instruction much be
> executed inline, by replacing the breakpoint instruction with original
> instruction.
> Eg: Dprobes implemented this approach, but did not provide single
> stepping out-of-line.
> Method 3 and 4 require similar breakpoint insertion/removal mechanism
> for the pages that are present in the memory and also for the pages
> that are not present in the memory during insertion of probes. URL of
> the initial patches are:
> Method 4 requires a mechaism for single stepping the original
> instruction out-of-line , URL of the prototype implentation is:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]