This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

utrace-based uprobes


Enclosed are updated versions of the uprobes patches that have been
sitting on the shelf for 1.5 months.  (It's a long story.)  These should
apply cleanly against any -mm kernel that includes utrace.

As noted my email to Roland of several minutes ago, we hope to soon
publish an update that completely reworks single-stepping out of line.

I also anticipate that we'll store the uretprobe trampoline in the same
vm area that holds the instruction slots, so if you review this stuff,
don't spend a lot of time worrying over init_uretprobes(),
do_init_uretprobes(), and uretprobe_set_trampoline().  The user won't
have to specify a trampoline address.

Jim Keniston

-------- Forwarded Message --------
From: Jim Keniston <jkenisto@us.ibm.com>
To: RedHat_perftools <external-perftools-list@redhat.com>
Subject: utrace-based uprobes
Date: 26 Jan 2007 18:34:03 -0800

As promised yesterday, here, in patch form, is our utrace-based
implementation of user-space probes.

Patch #1 implements the basic uprobes feature set for i386, and provides
Documentation/uprobes.txt, a user's guide.

Patch #2 adds user-space return probes.

Patch #3 adds the ability to register or unregister probes from probe
handlers.

The only existing kernel files tweaked by these patches are a couple of
Makefiles and arch/i386/Kconfig, so these patches should work for any
kernel that includes utrace.

In systemtap cvs,
- private/uprobes/patches contains these patches;
- private/uprobes/src contains the source files (which are currently in
sync with the patches); and
- private/uprobes/test and private/wilder/uprobes/test contain tests.

Comments welcome.

Jim Keniston
IBM LTC-RAS
Uprobes supplements utrace and kprobes, enabling a kernel module
to probe user-space applications in much the same way that
a kprobes-based module probes the kernel.

Uprobes enables you to dynamically break into any routine in a
user application and collect debugging and performance information
non-disruptively. You can trap at any code address, specifying a
kernel handler routine to be invoked when the breakpoint is hit.

Uprobes is layered on top of utrace.

The registration function, register_uprobe(), specifies which process is
to be probed, where the probe is to be inserted, and what handler is to
be called when the probe is hit. Refer to Documentation/uprobes.txt in
this patch for usage examples.

Salient points:

o Uprobes uses a breakpoint instruction underneath to break into the
program execution. By hooking utrace's signal callback, uprobes
recognizes a probe hit and runs the user-specified handler. The handler
may sleep.

o The uprobe breakpoint insertion is via access_process_vm() and hence
is copy-on-write and per-process.

o As uprobes uses utrace, a unique engine exists for every thread of a
process. Any newly created thread inherits all the probes and gets an
engine of its own. Upon thread exit, the engine is detached.

o As of now, uprobes aren't inherited across fork()s.

o i386 has an implementation where the original instruction at
the probepoint is single-stepped out of line (using space in the
stack vma). Other architectures have challenges in implementing
such a scheme.  (E.g., rip-relative addressing on x86_64 requires
the target operand to be within 2GB of the instruction, which the
stack never is.)  Single-stepping inline works for a single-threaded
process, but presents problems (missed probepoints, at the very least)
for a multithreaded process.  Our thoughts on this problem will be
discussed in a separate email.

o Probe registration and ungregistration in the context of a
multithreaded application, are asynchronous events. All threads
need to be QUIESCED before the program text is modified to insert
the breakpoint.

---

 Documentation/uprobes.txt  |  736 +++++++++++++++++++++++
 arch/i386/Kconfig          |    9 
 arch/i386/kernel/Makefile  |    1 
 arch/i386/kernel/uprobes.c |  186 ++++++
 include/asm-i386/uprobes.h |   63 ++
 include/linux/uprobes.h    |  249 ++++++++
 kernel/Makefile            |    1 
 kernel/uprobes.c           | 1383 +++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 2628 insertions(+)

diff -puN /dev/null Documentation/uprobes.txt
--- /dev/null	2007-03-12 08:02:26.505925865 -0700
+++ linux-2.6.21-rc3-jimk/Documentation/uprobes.txt	2007-03-12 13:42:52.000000000 -0700
@@ -0,0 +1,736 @@
+Title	: User-Space Probes (Uprobes)
+Author	: Jim Keniston <jkenisto@us.ibm.com>
+
+CONTENTS
+
+1. Concepts: Uprobes, Return Probes
+2. Architectures Supported
+3. Configuring Uprobes
+4. API Reference
+5. Uprobes Features and Limitations
+6. Interoperation with Kprobes
+7. Interoperation with Utrace
+8. Probe Overhead
+9. TODO
+10. Uprobes Team
+11. Uprobes Example
+12. Uretprobes Example
+
+1. Concepts: Uprobes, Return Probes
+
+Uprobes enables you to dynamically break into any routine in a
+user application and collect debugging and performance information
+non-disruptively. You can trap at any code address, specifying a
+kernel handler routine to be invoked when the breakpoint is hit.
+
+There are currently two types of user-space probes: uprobes and
+uretprobes (also called return probes).  A uprobe can be inserted on
+any instruction in the application's virtual address space.  A return
+probe fires when a specified user function returns.  These two probe
+types are discussed in more detail later.
+
+A registration function such as register_uprobe() specifies which
+process is to be probed, where the probe is to be inserted, and what
+handler is to be called when the probe is hit.
+
+Typically, Uprobes-based instrumentation is packaged as a kernel
+module.  In the simplest case, the module's init function installs
+("registers") one or more probes, and the exit function unregisters
+them.  However, probes can be registered or unregistered in response
+to other events as well.  For example:
+- A probe handler itself can register and/or unregister probes.
+- You can establish Utrace callbacks to register and/or unregister
+probes when a particular process forks, clones a thread,
+execs, enters a system call, receives a signal, exits, etc.
+See Documentation/utrace.txt.
+
+1.1 How Does a Uprobe Work?
+
+When a uprobe is registered, Uprobes makes a copy of the probed
+instruction, stops the probed application, replaces the first byte(s)
+of the probed instruction with a breakpoint instruction (e.g., int3
+on i386 and x86_64), and allows the probed application to continue.
+(When inserting the breakpoint, Uprobes uses the same copy-on-write
+mechanism that ptrace uses, so that the breakpoint affects only that
+process, and not any other process running that program.  This is
+true even if the probed instruction is in a shared library.)
+
+When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
+user-mode registers are saved, and a SIGTRAP signal is generated.
+Uprobes intercepts the SIGTRAP and finds the associated uprobe.
+It then executes the handler associated with the uprobe, passing the
+handler the addresses of the uprobe struct and the saved registers.
+The handler may sleep, but keep in mind that the probed thread remains
+stopped while your handler runs.
+
+Next, Uprobes single-steps its copy of the probed instruction and
+resumes execution of the probed process at the instruction following
+the probepoint.  (It might be simpler to single-step the actual
+instruction in place, but then Uprobes would have to temporarily
+remove the breakpoint instruction.  This would create problems in a
+multithreaded application.  For example, it would open a time window
+when another thread could sail right past the probepoint.)
+
+1.2 The Role of Utrace
+
+When a probe is registered on a previously unprobed process,
+Uprobes establishes a tracing "engine" with Utrace (see
+Documentation/utrace.txt) for each thread (task) in the process.
+Uprobes uses the Utrace "quiesce" mechanism to stop all the threads
+prior to insertion or removal of a breakpoint.  Utrace also notifies
+Uprobes of breakpoint and single-step traps and of other interesting
+events in the lifetime of the probed process, such as fork, clone,
+exec, and exit.
+
+1.3 How Does a Return Probe Work?
+
+When you call register_uretprobe(), Uprobes establishes a uprobe
+at the entry to the function.  When the probed function is called
+and this probe is hit, Uprobes saves a copy of the return address,
+and replaces the return address with the address of a "trampoline"
+-- a piece of code that contains a breakpoint instruction.
+
+When the probed function executes its return instruction, control
+passes to the trampoline and that breakpoint is hit.  Uprobes'
+trampoline handler calls the user-specified handler associated with the
+uretprobe, then sets the saved instruction pointer to the saved return
+address, and that's where execution resumes upon return from the trap.
+
+1.3.1 Establishing the Return-Probe Trampoline
+
+In the current implementation of uretprobes, before registering
+uretprobes on a particular probed process, the user must call
+init_uretprobes() for that process to specify the address of the
+trampoline.  This address must be on an executable page in the
+process's virtual address space, at a location that will not be
+subsequently accessed by the process.  init_uretprobes() inserts a
+breakpoint at the trampoline.
+
+If the process has already entered main(), then the first instruction
+of main() is a good place for the trampoline.  If you want to begin
+probing as soon as a process execs, your exec handler can set a
+uprobe on main().  Upon entry to main(), your associated uprobe
+handler can then call init_uretprobes() to set the trampoline at
+main(); the trampoline will not be set until after main()'s first
+instruction executes.
+
+1.4 Multithreaded Applications
+
+Uprobes supports the probing of multithreaded applications.  Uprobes
+imposes no limit on the number of threads in a probed application.
+All threads in a process use the same text pages, so every probe
+in a process affects all threads; of course, each thread hits the
+probepoint (and runs the handler) independently.  Multiple threads
+may run the same handler simultaneously.  If you want a particular
+thread or set of threads to run a particular handler, your handler
+should check current or current->pid to determine which thread has
+hit the probepoint.
+
+When a process clones a new thread, that thread automatically shares
+all current and future probes established for that process.
+
+Keep in mind that when you register or unregister a probe, the
+breakpoint is not inserted or removed until Utrace has stopped all
+threads in the process.  The register/unregister function returns
+after the breakpoint has been inserted/removed (but see the next
+section).
+
+1.5 Registering Probes within Probe Handlers
+
+A uprobe or uretprobe handler can call any of the functions
+in the Uprobes API (init_uretprobes(), [un]register_uprobe(),
+[un]register_uretprobe()).  A handler can even unregister its own
+probe.  However, when invoked from a handler, the actual [un]register
+operations do not take place immediately.  Rather, they are queued up
+and executed after all handlers for that probepoint have been run.
+In the handler, the [un]register call returns -EINPROGRESS.  If you
+set the registration_callback field in the uprobe object, that callback
+will be called when the [un]register operation completes.
+
+2. Architectures Supported
+
+Uprobes and uretprobes are implemented on the following
+architectures:
+
+- i386
+- x86_64 (AMD-64, EM64T)	// in progress
+- ppc64				// in progress
+// - ia64 			// not started
+- s390x				// in progress
+
+3. Configuring Uprobes
+
+// TODO: The patch actually puts Uprobes configuration under "Instrumentation
+// Support" with Kprobes.  Need to decide which is the better place.
+
+When configuring the kernel using make menuconfig/xconfig/oldconfig,
+ensure that CONFIG_UPROBES is set to "y".  Under "Process debugging
+support," select "Infrastructure for tracing and debugging user
+processes" to enable Utrace, then select "Uprobes".
+
+So that you can load and unload Uprobes-based instrumentation modules,
+make sure "Loadable module support" (CONFIG_MODULES) and "Module
+unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
+
+4. API Reference
+
+The Uprobes API includes a "register" function and an "unregister"
+function for each type of probe.  Here are terse, mini-man-page
+specifications for these functions and the associated probe handlers
+that you'll write.  See the latter half of this document for examples.
+
+4.1 register_uprobe
+
+#include <linux/uprobes.h>
+int register_uprobe(struct uprobe *u);
+
+Sets a breakpoint at virtual address u->vaddr in the process whose
+pid is u->pid.  When the breakpoint is hit, Uprobes calls u->handler.
+
+register_uprobe() returns 0 on success, -EINPROGRESS if
+register_uprobe() was called from a uprobe or uretprobe handler
+(and therefore delayed), or a negative errno otherwise.
+
+Section 4.5, "User's Callback for Delayed Registrations",
+explains how to be notified upon completion of a delayed
+registration.
+
+User's handler (u->handler):
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+void handler(struct uprobe *u, struct pt_regs *regs);
+
+Called with u pointing to the uprobe associated with the breakpoint,
+and regs pointing to the struct containing the registers saved when
+the breakpoint was hit.
+
+4.2 init_uretprobes
+
+#include <linux/uprobes.h>
+int init_uretprobes(pid_t pid, unsigned long vaddr);
+
+Establishes a uretprobe trampoline at virtual address vaddr in the
+process whose pid is pid.  This function must be called before any
+call to register_uretprobes() on the same process.  See Section 1.3.1,
+"Establishing the Return-Probe Trampoline", for further guidance.
+the uretprobe trampoline.
+
+init_uretprobes() returns 0 on success, -EINPROGRESS if
+init_uretprobes() was called from a uprobe or uretprobe handler,
+or a negative errno otherwise.  In particular, a return value of
+-EEXIST specifies that the uretprobe trampoline for this process
+has already been established at a different address (e.g., by a
+different module probing the same process).
+
+As previously mentioned, you can call init_uretprobes() from a
+uprobe or uretprobe handler.  You can also call register_uretprobe()
+for that process from that same handler instance, so long as the
+init_uretprobes() call comes first.
+
+Eventually, uprobes may be smart enough, at least on certain
+architectures, to establish a process's uretprobe trampoline without
+guidance from the user.  For example, on i386, one trampoline on
+the vdso page could serve for all processes. For such architectures,
+the user could call init_uretprobes() with vaddr=0 and let uprobes
+select the location of the trampoline.
+
+4.3 register_uretprobe
+
+#include <linux/uprobes.h>
+int register_uretprobe(struct uretprobe *rp);
+
+Establishes a return probe in the process whose pid is rp->u.pid for
+the function whose address is rp->u.vaddr.  When that function returns,
+Uprobes calls rp->handler.
+
+Before your first call to register_uretprobe() on a particular probed
+process, you must first call init_uretprobes (see Section 4.2).
+
+register_uretprobe() returns 0 on success, -EINPROGRESS if
+register_uretprobe() was called from a uprobe or uretprobe handler
+(and therefore delayed), or a negative errno otherwise.
+
+Section 4.5, "User's Callback for Delayed Registrations",
+explains how to be notified upon completion of a delayed
+registration.
+
+User's return-probe handler (rp->handler):
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+void uretprobe_handler(struct uretprobe_instance *ri, struct pt_regs *regs);
+
+regs is as described for the user's uprobe handler.  ri points to
+the uretprobe_instance object associated with the particular function
+instance that is currently returning.  The following fields in that
+object may be of interest:
+- ret_addr: the return address
+- rp: points to the corresponding uretprobe object
+
+In ptrace.h, the regs_return_value(regs) macro provides a simple
+abstraction to extract the return value from the appropriate register
+as defined by the architecture's ABI.
+
+4.4 unregister_*probe
+
+#include <linux/uprobes.h>
+void unregister_uprobe(struct uprobe *u);
+void unregister_uretprobe(struct uretprobe *rp);
+
+Removes the specified probe.  The unregister function can be called
+at any time after the probe has been registered, and can be called
+from a uprobe or uretprobe handler.
+
+4.5 User's Callback for Delayed Registrations
+
+#include <linux/uprobes.h>
+void registration_callback(struct uprobe *u, int reg,
+	enum uprobe_type type, int result);
+
+As previously mentioned, the functions described in Section 4 can be
+called from within a uprobe or uretprobe handler.  When that happens,
+the [un]registration operation is delayed until all handlers associated
+with that handler's probepoint have been run.  Upon completion of the
+[un]registration operation, Uprobes checks the registration_callback
+member of the associated uprobe: u->registration_callback
+for [un]register_uprobe or rp->u.registration_callback for
+[un]register_uretprobe.  Uprobes calls that callback function, if any,
+passing it the following values:
+
+- u = the address of the uprobe object.  (For a uretprobe, you can use
+container_of(u, struct uretprobe, u) to obtain the address of the
+uretprobe object.)
+
+- reg = 1 for register_u[ret]probe() or 0 for unregister_u[ret]probe()
+
+- type = UPTY_UPROBE or UPTY_URETPROBE
+
+- result = the return value that register_u[ret]probe() would have
+returned if this weren't a delayed operation.  This is always 0
+for unregister_u[ret]probe().
+
+NOTE: Uprobes calls the registration_callback ONLY in the case of a
+delayed [un]registration.
+
+5. Uprobes Features and Limitations
+
+The user is expected to assign values to the following members
+of struct uprobe: pid, vaddr, handler, and (as needed)
+registration_callback.  Other members are reserved for Uprobes' use.
+Uprobes may produce unexpected results if you:
+- assign non-zero values to reserved members of struct uprobe;
+- change the contents of a uprobe or uretprobe object while it is
+registered; or
+- attempt to register a uprobe or uretprobe that is already registered.
+
+Uprobes allows any number of probes (uprobes and/or uretprobes)
+at a particular address.  For a particular probepoint, handlers are
+run in the order in which they were registered.
+
+Any number of kernel modules may probe a particular process
+simultaneously, and a particular module may probe any number of
+processes simultaneously.
+
+Probes are shared by all threads in a process (including newly created
+threads).
+
+If a probed process exits or execs, Uprobes automatically unregisters
+all uprobes and uretprobes associated with that process.  Subsequent
+attempts to unregister these probes will be treated as no-ops.
+
+On the other hand, if a probed memory area is removed from the
+process's virtual memory map (e.g., via dlclose(3) or munmap(2)),
+it's currently up to you to unregister the probes first.
+
+There is no way to specify that probes should be inherited across fork;
+Uprobes removes all probepoints in the newly created child process.
+See Section 7, "Interoperation with Utrace", for more information on
+this topic.
+
+On at least some architectures, Uprobes makes no attempt to verify
+that the probe address you specify actually marks the start of an
+instruction.  If you get this wrong, chaos may ensue.
+
+To avoid interfering with interactive debuggers, Uprobes will refuse
+to insert a probepoint where a breakpoint instruction already exists,
+unless it was Uprobes that put it there.  Some architectures may
+refuse to insert probes on other types of instructions.
+
+If you install a probe in an inline-able function, Uprobes makes
+no attempt to chase down all inline instances of the function and
+install probes there.  gcc may inline a function without being asked,
+so keep this in mind if you're not seeing the probe hits you expect.
+
+A probe handler can modify the environment of the probed function
+-- e.g., by modifying data structures, or by modifying the
+contents of the pt_regs struct (which are restored to the registers
+upon return from the breakpoint).  So Uprobes can be used, for example,
+to install a bug fix or to inject faults for testing.  Uprobes, of
+course, has no way to distinguish the deliberately injected faults
+from the accidental ones.  Don't drink and probe.
+
+Since a return probe is implemented by replacing the return
+address with the trampoline's address, stack backtraces and calls
+to __builtin_return_address() will typically yield the trampoline's
+address instead of the real return address for uretprobed functions.
+
+If the number of times a function is called does not match the
+number of times it returns (e.g., if a function exits via longjmp()),
+registering a return probe on that function may produce undesirable
+results.
+
+When you register the first probe at probepoint or unregister the
+last probe probe at a probepoint, Uprobes asks Utrace to "quiesce"
+the probed process so that Uprobes can insert or remove the breakpoint
+instruction.  If the process is not already stopped, Utrace sends it
+a SIGSTOP.  If the process is running an interruptible system call,
+this may cause the system call to finish early or fail with EINTR.
+(The PTRACE_ATTACH request of the ptrace system call has this same
+limitation.)
+
+When Uprobes establishes a probepoint on a previous unprobed page
+of text, Linux creates a new copy of the page via its copy-on-write
+mechanism.  When probepoints are removed, Uprobes makes no attempt
+to consolidate identical copies of the same page.  This could affect
+memory availability if you probe many, many pages in many, many
+long-running processes.
+
+6. Interoperation with Kprobes
+
+Uprobes is intended to interoperate usefully with Kprobes (see
+Documentation/kprobes.txt).  For example, an instrumentation module
+can make calls to both the Kprobes API and the Uprobes API.
+
+A uprobe or uretprobe handler can register or unregister kprobes,
+jprobes, and kretprobes, as well as uprobes and uretprobes.  On the
+other hand, a kprobe, jprobe, or kretprobe handler must not sleep, and
+therefore cannot register or unregister any of these types of probes.
+(Ideas for removing this restriction are welcome.)
+
+Note that the overhead of a u[ret]probe hit is several times that of
+a k[ret]probe hit.
+
+7. Interoperation with Utrace
+
+As mentioned in Section 1.2, Uprobes is a client of Utrace.  For each
+probed thread, Uprobes establishes a Utrace engine, and registers
+callbacks for the following types of events: clone/fork, exec, exit,
+and "core-dump" signals (which include breakpoint traps).  Uprobes
+establishes this engine when the process is first probed, or when
+Uprobes is notified of the thread's creation, whichever comes first.
+
+An instrumentation module can use both the Utrace and Uprobes APIs (as
+well as Kprobes).  When you do this, keep the following facts in mind:
+
+- For a particular event, Utrace callbacks are called in the order in
+which the engines are established.  Utrace does not currently provide
+a mechanism for altering this order.
+
+- When Uprobes learns that a probed process has forked, it removes
+the breakpoints in the child process.
+
+- When Uprobes learns that a probed process has exec-ed or exited,
+it disposes of its data structures for that process (first allowing
+any outstanding [un]registration operations to terminate).
+
+- When a probed thread hits a breakpoint or completes single-stepping
+of a probed instruction, engines with the UTRACE_EVENT(SIGNAL_CORE)
+flag set are notified.  The Uprobes signal callback prevents (via
+UTRACE_ACTION_HIDE) this event from being reported to engines later
+in the list.  But if your engine was established before Uprobes's,
+you will see this this event.
+
+If you want to establish probes in a newly forked child, you can use
+the following procedure:	// TODO: Test this.
+
+- Register a report_clone callback with Utrace.  In this callback,
+the CLONE_THREAD flag distinguishes between the creation of a new
+thread vs. a new process.
+
+- In your report_clone callback, set the engine's UTRACE_ACTION_QUIESCE
+flag.  The new process will quiesce at a point where it is ready to
+be probed.
+
+- In your report_quiesce callback, register the desired probes.
+(Note that you cannot use the same probe object for both parent
+and child.  If you want to duplicate the probepoints, you must
+create a new set of u[ret]probe objects.)
+
+8. Probe Overhead
+
+// TODO: Adjust as other architectures are tested.
+On a typical CPU in use in 2007, a uprobe hit takes 4 to 5
+microseconds to process.  Specifically, a benchmark that hits the same
+probepoint repeatedly, firing a simple handler each time, reports
+200,000 to 250,000 hits per second, depending on the architecture.
+A return-probe hit typically takes 50% longer than a uprobe hit.
+When you have a return probe set on a function, adding a uprobe at
+the entry to that function adds essentially no overhead.
+
+Here are sample overhead figures (in usec) for different architectures.
+u = uprobe; r = return probe; ur = uprobe + return probe
+
+i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
+u = 4.3 usec; r = 6.2 usec; ur = 6.3 usec
+
+x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
+// TODO
+
+ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
+// TODO
+
+9. TODO
+
+a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
+programming interface for probe-based instrumentation.  SystemTap
+already supports kernel probes.  It could exploit Uprobes as well.
+b. Support for other architectures.
+
+10. Uprobes Team
+
+The following people have made major contributions to Uprobes:
+Jim Keniston - jkenisto@us.ibm.com
+Ananth Mavinakayanahalli - ananth@in.ibm.com
+Prasanna Panchamukhi - prasanna@in.ibm.com
+Dave Wilder - dwilder@us.ibm.com
+
+11. Uprobes Example
+
+Here's a sample kernel module showing the use of Uprobes to count the
+number of times an instruction at a particular address is executed,
+and optionally (unless verbose=0) report each time it's executed.
+----- cut here -----
+/* uprobe_example.c */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/uprobes.h>
+
+/*
+ * Usage: insmod uprobe_example.ko pid=<pid> vaddr=<address> [verbose=0]
+ * where <pid> identifies the probed process and <address> is the virtual
+ * address of the probed instruction.
+ */
+
+static int pid = 0;
+module_param(pid, int, 0);
+MODULE_PARM_DESC(pid, "pid");
+
+static int verbose = 1;
+module_param(verbose, int, 0);
+MODULE_PARM_DESC(verbose, "verbose");
+
+static long vaddr = 0;
+module_param(vaddr, long, 0);
+MODULE_PARM_DESC(vaddr, "vaddr");
+
+static int nhits;
+static struct uprobe usp;
+
+static void uprobe_handler(struct uprobe *u, struct pt_regs *regs)
+{
+	nhits++;
+	if (verbose)
+		printk(KERN_INFO "Hit #%d on probepoint at %#lx\n",
+			nhits, u->vaddr);
+}
+
+int init_module(void)
+{
+	int ret;
+	usp.pid = pid;
+	usp.vaddr = vaddr;
+	usp.handler = uprobe_handler;
+	printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	ret = register_uprobe(&usp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret);
+		return -1;
+	}
+	return 0;
+}
+
+void cleanup_module(void)
+{
+	printk(KERN_INFO "Unregistering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	printk(KERN_INFO "Probepoint was hit %d times\n", nhits);
+	unregister_uprobe(&usp);
+}
+MODULE_LICENSE("GPL");
+----- cut here -----
+
+You can build the kernel module, uprobe_example.ko, using the following
+Makefile:
+----- cut here -----
+obj-m := uprobe_example.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+	rm -f *.mod.c *.ko *.o .*.cmd
+	rm -rf .tmp_versions
+----- cut here -----
+
+For example, if you want to run myprog and monitor its calls to myfunc(),
+you can do the following:
+
+$ make			// Build the uprobe_example module.
+...
+$ nm -p myprog | awk '$3=="myfunc"'
+080484a8 T myfunc
+$ ./myprog &
+$ ps
+  PID TTY          TIME CMD
+ 4367 pts/3    00:00:00 bash
+ 8156 pts/3    00:00:00 myprog
+ 8157 pts/3    00:00:00 ps
+$ su -
+...
+# insmod uprobe_example.ko pid=8156 vaddr=0x080484a8
+
+In /var/log/messages and on the console, you will see a message of the
+form "kernel: Hit #1 on probepoint at 0x80484a8" each time myfunc()
+is called.  To turn off probing, remove the module:
+
+# rmmod uprobe_example
+
+In /var/log/messages and on the console, you will see a message of the
+form "Probepoint was hit 5 times".
+
+12. Uretprobes Example
+
+Here's a sample kernel module showing the use of a return probe to
+report a function's return values.
+----- cut here -----
+/* uretprobe_example.c */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+
+/*
+ * Usage:
+ * insmod uretprobe_example.ko pid=<pid> func=<addr1> tramp=<addr2> [verbose=0]
+ * where <pid> identifies the probed process, <addr1> is the virtual
+ * address of the probed function, and <addr2> is the virtual address
+ * of main() (which we use for the uretprobe trampoline).
+ */
+
+static int pid = 0;
+module_param(pid, int, 0);
+MODULE_PARM_DESC(pid, "pid");
+
+static int verbose = 1;
+module_param(verbose, int, 0);
+MODULE_PARM_DESC(verbose, "verbose");
+
+static long func = 0;
+module_param(func, long, 0);
+MODULE_PARM_DESC(func, "func");
+
+static long tramp = 0;
+module_param(tramp, long, 0);
+MODULE_PARM_DESC(tramp, "tramp");
+
+static int ncall, nret;
+static struct uprobe usp;
+static struct uretprobe rp;
+
+static void uprobe_handler(struct uprobe *u, struct pt_regs *regs)
+{
+	ncall++;
+	if (verbose)
+		printk(KERN_INFO "Function at %#lx called\n", u->vaddr);
+}
+
+static void uretprobe_handler(struct uretprobe_instance *ri,
+	struct pt_regs *regs)
+{
+	nret++;
+	if (verbose)
+		printk(KERN_INFO "Function at %#lx returns %#lx\n",
+			ri->rp->u.vaddr, regs_return_value(regs));
+}
+
+int init_module(void)
+{
+	int ret;
+	/* Establish the uretprobe trampoline -- e.g., at main(). */
+	ret = init_uretprobes(pid, tramp);
+	if (ret != 0) {
+		printk(KERN_ERR
+			"init_uretprobes(%d, %#lx) failed, returned %d\n",
+			pid, tramp, ret);
+		return -1;
+	}
+
+	/* Register the entry probe. */
+	usp.pid = pid;
+	usp.vaddr = func;
+	usp.handler = uprobe_handler;
+	printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	ret = register_uprobe(&usp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret);
+		return -1;
+	}
+
+	/* Register the return probe. */
+	rp.u.pid = pid;
+	rp.u.vaddr = func;
+	rp.handler = uretprobe_handler;
+	printk(KERN_INFO "Registering return probe on pid %d, vaddr %#lx\n",
+		rp.u.pid, rp.u.vaddr);
+	ret = register_uretprobe(&rp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uretprobe() failed, returned %d\n",
+			ret);
+		unregister_uprobe(&usp);
+		return -1;
+	}
+	return 0;
+}
+
+void cleanup_module(void)
+{
+	printk(KERN_INFO "Unregistering probes on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	printk(KERN_INFO "%d calls, %d returns\n", ncall, nret);
+	unregister_uprobe(&usp);
+	unregister_uretprobe(&rp);
+}
+MODULE_LICENSE("GPL");
+----- cut here -----
+
+Build the kernel module as shown in the above uprobe example.  Since
+we're using return probes, we'll need to specify an address for the
+uretprobe trampoline.  We'll use main() for the trampoline:
+
+$ nm -p myprog | awk '$3=="main"'
+080484bc T main
+$ nm -p myprog | awk '$3=="myfunc"'
+080484a8 T myfunc
+$ ./myprog &
+$ ps
+  PID TTY          TIME CMD
+ 4367 pts/3    00:00:00 bash
+ 9156 pts/3    00:00:00 myprog
+ 9157 pts/3    00:00:00 ps
+$ su -
+...
+# insmod uretprobe_example.ko pid=9156 func=0x080484a8 tramp=0x080484bc
+
+In /var/log/messages and on the console, you will see messages such
+as the following:
+kernel: Function at 0x80484a8 called
+kernel: Function at 0x80484a8 returns 0x3
+To turn off probing, remove the module:
+
+# rmmod uretprobe_example
+
+In /var/log/messages and on the console, you will see a message of the
+form "73 calls, 73 returns".
diff -puN arch/i386/Kconfig~1-uprobes-base arch/i386/Kconfig
--- linux-2.6.21-rc3/arch/i386/Kconfig~1-uprobes-base	2007-03-12 13:41:39.000000000 -0700
+++ linux-2.6.21-rc3-jimk/arch/i386/Kconfig	2007-03-12 13:44:56.000000000 -0700
@@ -1262,6 +1262,15 @@ config KPROBES
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+config UPROBES
+	bool "Userspace probes (EXPERIMENTAL)"
+	depends on UTRACE && EXPERIMENTAL && MODULES
+	help
+	  Uprobes allows you to trap at a userspace text address and
+	  execute a specified handler in kernel. For more information
+	  refer Documentation/uprobes.txt.
+	  If in doubt, say "N".
+
 source "kernel/Kconfig.marker"
 
 endmenu
diff -puN arch/i386/kernel/Makefile~1-uprobes-base arch/i386/kernel/Makefile
--- linux-2.6.21-rc3/arch/i386/kernel/Makefile~1-uprobes-base	2007-03-12 13:41:39.000000000 -0700
+++ linux-2.6.21-rc3-jimk/arch/i386/kernel/Makefile	2007-03-12 13:46:37.000000000 -0700
@@ -39,6 +39,7 @@ obj-$(CONFIG_EARLY_PRINTK)	+= early_prin
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MARKERS_ENABLE_OPTIMIZATION)	+= marker.o
+obj-$(CONFIG_UPROBES)		+= uprobes.o
 
 obj-$(CONFIG_VMI)		+= vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff -puN /dev/null arch/i386/kernel/uprobes.c
--- /dev/null	2007-03-12 08:02:26.505925865 -0700
+++ linux-2.6.21-rc3-jimk/arch/i386/kernel/uprobes.c	2007-03-12 13:42:52.000000000 -0700
@@ -0,0 +1,186 @@
+/*
+ *  Userspace Probes (UProbes)
+ *  kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/uprobes.h>
+#include <linux/mm.h>
+#include <linux/dcache.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <asm/kdebug.h>
+#include <asm/uprobes.h>
+
+#ifdef SS_OUT_OF_LINE
+/*
+ * This routine finds an appropriate address for the instruction-copy that's
+ * to be single-stepped out of line.  Between that address and the top of stack,
+ * there must be room for the instruction (size bytes) and potential
+ * growth (and overwriting) of the stack due to an instruction such as push.
+ * If there's not enough room on the page containing the top of stack,
+ * select the adjacent page.  (access_process_vm() will expand the stack
+ * vma as necessary.)
+ */
+static __always_inline long find_stack_space(long tos, int size)
+{
+	long page_addr = tos & PAGE_MASK;
+
+	if ((page_addr + size + SLOT_BEFORE_ORIGINSN) > tos)
+		page_addr -= PAGE_SIZE;
+
+	return page_addr;
+}
+
+/*
+ * This routine provides the functionality of single stepping
+ * out-of-line. FIXME: if single stepping out-of-line cannot be
+ * achieved.
+ */
+int uprobe_prepare_singlestep(struct uprobe_kimg *uk,
+		struct uprobe_task *utask, struct pt_regs *regs)
+{
+	long addr = 0, stack_addr = regs->esp;
+	int ret, size = sizeof(uprobe_opcode_t) * MAX_UINSN_SIZE;
+	long *source = (long *)uk->insn;
+	struct vm_area_struct *vma;
+	struct task_struct *tsk = utask->tsk;
+
+	/*
+	 * Get free stack space to copy original instruction, so as to
+	 * single step out-of-line.
+	 */
+	addr = find_stack_space(stack_addr, size);
+	if (!addr)
+		return 1;
+	/*
+	 * Copy original instruction on this per process stack
+	 * page so as to single step out-of-line.
+	 */
+	ret = access_process_vm(tsk, addr, source, size, 1);
+	if (ret < size)
+		return 1;
+
+	regs->eip = addr;
+	utask->singlestep_addr = regs->eip;
+
+	down_write(&tsk->mm->mmap_sem);
+	vma = find_vma(tsk->mm, addr);
+	BUG_ON(!vma);
+	utask->orig_vm_flags = vma->vm_flags;
+	vma->vm_flags |=  (VM_EXEC | VM_EXECUTABLE);
+	up_write(&tsk->mm->mmap_sem);
+
+	return 0;
+}
+
+/*
+ * Called by uprobe_resume_execution to adjust the return address
+ * pushed by a call instruction executed out-of-line.
+ */
+static void adjust_ret_addr(long esp, long correction)
+{
+	int nleft;
+	long ra;
+
+	nleft = copy_from_user(&ra, (const void __user *) esp, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	ra +=  correction;
+	nleft = copy_to_user((void __user *) esp, &ra, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	return;
+
+fail:
+	printk(KERN_ERR
+		"uprobes: Failed to adjust return address after"
+		" single-stepping call instruction;"
+		" pid=%d, esp=%#lx\n", current->pid, esp);
+	BUG();
+}
+/*
+ * Called after single-stepping.  uk->vaddr is the address of the
+ * instruction whose first byte has been replaced by the "int 3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is utask->singlestep_addr.
+ *
+ * This function prepares to return from the post-single-step
+ * interrupt.  We have to fix up the stack as follows:
+ *
+ * 0) Typically, the new eip is relative to the copied instruction.  We
+ * need to make it relative to the original instruction.  Exceptions are
+ * return instructions and absolute or indirect jump or call instructions.
+ *
+ * 1) If the single-stepped instruction was a call, the return address
+ * that is atop the stack is the address following the copied instruction.
+ * We need to make it the address following the original instruction.
+ */
+void uprobe_resume_execution(struct uprobe_kimg *uk,
+				struct uprobe_task *utask, struct pt_regs *regs)
+{
+	long next_eip = 0;
+	long copy_eip = utask->singlestep_addr;
+	long orig_eip = uk->vaddr;
+	struct vm_area_struct *vma;
+	struct task_struct *tsk = utask->tsk;
+
+	switch (uk->insn[0]) {
+	case 0xc3:		/* ret/lret */
+	case 0xcb:
+	case 0xc2:
+	case 0xca:
+		next_eip = regs->eip;
+		/* eip is already adjusted, no more changes required*/
+		break;
+	case 0xe8:		/* call relative - Fix return addr */
+		adjust_ret_addr(regs->esp, (orig_eip - copy_eip));
+		break;
+	case 0xff:
+		if ((uk->insn[1] & 0x30) == 0x10) {
+			/* call absolute, indirect */
+			/* Fix return addr; eip is correct. */
+			next_eip = regs->eip;
+			adjust_ret_addr(regs->esp, (orig_eip - copy_eip));
+		} else if (((uk->insn[1] & 0x31) == 0x20) ||
+			   ((uk->insn[1] & 0x31) == 0x21)) {
+			/* jmp near or jmp far  absolute indirect */
+			/* eip is correct. */
+			next_eip = regs->eip;
+		}
+		break;
+	case 0xea:		/* jmp absolute -- eip is correct */
+		next_eip = regs->eip;
+		break;
+	default:
+		break;
+	}
+
+	if (next_eip)
+		regs->eip = next_eip;
+	else
+		regs->eip = orig_eip + (regs->eip - copy_eip);
+
+	down_write(&tsk->mm->mmap_sem);
+	vma = find_vma(tsk->mm, copy_eip);
+	BUG_ON(!vma);
+	vma->vm_flags = utask->orig_vm_flags;
+	up_write(&tsk->mm->mmap_sem);
+}
+#endif	/* SSTEP_OUT_OF_LINE */
diff -puN /dev/null include/asm-i386/uprobes.h
--- /dev/null	2007-03-12 08:02:26.505925865 -0700
+++ linux-2.6.21-rc3-jimk/include/asm-i386/uprobes.h	2007-03-12 13:42:52.000000000 -0700
@@ -0,0 +1,63 @@
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+#ifndef SS_OUT_OF_LINE
+#define SS_OUT_OF_LINE
+#endif
+/*
+ *  Userspace Probes (UProbes)
+ *  include/asm-i386/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+typedef u8 uprobe_opcode_t;
+#define BREAKPOINT_INSTRUCTION	0xcc
+#define BP_INSN_SIZE 1
+#define MAX_UINSN_SIZE 16
+#define SLOT_IP 12	/* instruction pointer slot from include/asm/elf.h */
+
+/* Architecture specific switch for where the IP points after a bp hit */
+#define ARCH_BP_INST_PTR(inst_ptr)	(inst_ptr - BP_INSN_SIZE)
+
+/*
+ * Leave some stack space after TOS for instructions
+ * like push/pusha before copying the original instruction.
+ */
+#define SLOT_BEFORE_ORIGINSN 32
+
+struct uprobe_kimg;
+
+/* Caller prohibits probes on int3. We currently allow everything else */
+static inline int arch_validate_probed_insn(struct uprobe_kimg *uk)
+{
+	return 0;
+}
+
+/* On i386, the int3 traps leaves eip pointing past the int instruction. */
+static inline unsigned long arch_get_probept(struct pt_regs *regs)
+{
+	return (unsigned long) (regs->eip - BP_INSN_SIZE);
+}
+
+static inline void arch_reset_ip_for_sstep(struct pt_regs *regs)
+{
+	regs->eip -= BP_INSN_SIZE;
+}
+
+#endif				/* _ASM_UPROBES_H */
diff -puN /dev/null include/linux/uprobes.h
--- /dev/null	2007-03-12 08:02:26.505925865 -0700
+++ linux-2.6.21-rc3-jimk/include/linux/uprobes.h	2007-03-12 13:42:52.000000000 -0700
@@ -0,0 +1,249 @@
+#ifndef _LINUX_UPROBES_H
+#define _LINUX_UPROBES_H
+/*
+ * Userspace Probes (UProbes)
+ * include/linux/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/list.h>
+#include <linux/smp.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
+
+struct pt_regs;
+struct task_struct;
+struct utrace_attached_engine;
+struct uprobe_kimg;
+struct uprobe;
+
+/*
+ * This is what the user supplies us.
+ */
+struct uprobe {
+	/*
+	 * The pid of the probed process.  Currently, this can be the
+	 * thread ID (task->pid) of any active thread in the process?
+	 * TODO: Verify.
+	 */
+	pid_t pid;
+
+	/* location of the probe point */
+	unsigned long vaddr;
+
+	/* Handler to run when the probepoint is hit */
+	void (*handler)(struct uprobe*, struct pt_regs*);
+
+	/* Subsequent members are for internal use only. */
+
+	/*
+	 * -EBUSY while we're waiting for all threads to quiesce so the
+	 * associated breakpoint can be inserted or removed.
+	 * 0 if the the insert/remove operation has succeeded, or -errno
+	 * otherwise.
+	 */
+	volatile int status;
+
+	/* All uprobes with this pid and vaddr map to uk. */
+	struct uprobe_kimg *uk;
+
+	/* on uprobe_kimg's list */
+	struct list_head list;
+
+	/* This simplifies mapping uprobe to uprobe_process. */
+	pid_t tgid;
+};
+
+#ifdef CONFIG_UPROBES
+#include <asm/uprobes.h>
+
+enum uprobe_state {
+	UPROBE_INSERTING,	// process quiescing prior to insertion
+	UPROBE_BP_SET,		// breakpoint in place
+	UPROBE_REMOVING,	// process quiescing prior to removal
+	UPROBE_DISABLED,	// removal completed
+	UPROBE_FREEING		// being deallocated
+};
+
+enum uprobe_task_state {
+	UPTASK_QUIESCENT,
+	UPTASK_SLEEPING,	// used when task may not be able to quiesce
+	UPTASK_RUNNING,
+	UPTASK_BP_HIT,
+	UPTASK_SSTEP_AFTER_BP
+};
+
+#define UPROBE_HASH_BITS 5
+#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS)
+
+/*
+ * uprobe_process -- not a user-visible struct.
+ * A uprobe_process represents a probed process.  A process can have
+ * multiple probepoints (each represented by a uprobe_kimg) and
+ * one or more threads.
+ */
+struct uprobe_process {
+	/* mutex protects all accesses to uprobe_process */
+	/*
+	 * TODO: Consider going back to a more precise locking model:
+	 * utable_mutex: protects uprobe_table, nuk, pending_uprobes.
+	 * (Note that a per-hash-bucket mutex probably would help
+	 * only in situations where different tasks in the same
+	 * process are very frequently hitting different uprobes.)
+	 * tlist_mutex: protects thread_list, nthreads, n_quiescent_threads.
+	 */
+	struct mutex mutex;
+
+	/* Table of uprobe_kimgs registered for this process */
+	/* TODO: Switch to list_head[] per Ingo. */
+	struct hlist_head uprobe_table[UPROBE_TABLE_SIZE];
+	int nuk;	/* number of uprobe_kimgs */
+
+	/* List of uprobe_kimgs awaiting insertion or removal */
+	struct list_head pending_uprobes;
+
+	/* List of uprobe_tasks in this task group */
+	struct list_head thread_list;
+	int nthreads;
+	int n_quiescent_threads;
+
+	/* this goes on the uproc_table */
+	struct hlist_node hlist;
+
+	/*
+	 * All threads (tasks) in a process share the same uprobe_process.
+	 * We do NOT assume that the task with pid=tgid is still alive.
+	 */
+	pid_t tgid;
+
+	/* Threads in SLEEPING state wait here to be roused. */
+	wait_queue_head_t waitq;
+};
+
+/*
+ * uprobe_kimg -- not a user-visible struct.
+ * Abstraction to store kernel's internal uprobe data.
+ * Corresponds to a probepoint, at which several uprobes can be registered.
+ */
+struct uprobe_kimg {
+	/*
+	 * Object is read-locked to run handlers so that multiple threads
+	 * in a process can run handlers for same probepoint simultaneously.
+	 */
+	struct rw_semaphore rwsem;
+
+	/* vaddr copied from (first) uprobe */
+	unsigned long vaddr;
+
+	/* The uprobe(s) associated with this uprobe_kimg */
+	struct list_head uprobe_list;
+
+	volatile enum uprobe_state state;
+
+	/* Saved opcode (which has been replaced with breakpoint) */
+	uprobe_opcode_t opcode;
+
+	/* Saved original instruction */
+	uprobe_opcode_t insn[MAX_UINSN_SIZE];
+
+	/* The corresponding struct uprobe_process */
+	struct uprobe_process *uproc;
+
+	/*
+	 * uk goes in the uprobe_process->uprobe_table when registered --
+	 * even before the breakpoint has been inserted.
+	 */
+	struct hlist_node ut_node;
+
+	/*
+	 * uk sits in the uprobe_process->pending_uprobes queue while
+	 * awaiting insertion or removal of the breakpoint.
+	 */
+	struct list_head pd_node;
+
+	/* [un]register_uprobe() waits 'til bkpt inserted/removed. */
+	wait_queue_head_t waitq;
+};
+
+/*
+ * uprobe_utask -- not a user-visible struct.
+ * Corresponds to a thread in a probed process.
+ */
+struct uprobe_task {
+	/* Lives on the thread_list for the uprobe_process */
+	struct list_head list;
+
+	/* This is a back pointer to the task_struct for this task */
+	struct task_struct *tsk;
+
+	/* The utrace engine for this task */
+	struct utrace_attached_engine *engine;
+
+	/* Back pointer to the associated uprobe_process */
+	struct uprobe_process *uproc;
+
+	volatile enum uprobe_task_state state;
+
+	/*
+	 * quiescing = 1 means this task has been asked to quiesce.
+	 * It may not be able to comply immediately if it's hit a bkpt.
+	 */
+	volatile int quiescing;
+
+	/* Saved address of copied original instruction */
+	long singlestep_addr;
+
+	/* Saved vm_flags for the period of single stepping */
+	unsigned long orig_vm_flags;
+
+	/* Task currently running quiesce_all_threads() */
+	struct task_struct *quiesce_master;
+
+	/* Set before running handlers; cleared after single-stepping. */
+	struct uprobe_kimg *active_probe;
+
+	/* [un]registrations initiated by handlers must be asynchronous. */
+	struct list_head deferred_registrations;
+
+	/* LIFO -- active instances */
+	struct hlist_head uretprobe_instances;
+
+	struct mutex mutex;
+};
+
+int register_uprobe(struct uprobe *u);
+void unregister_uprobe(struct uprobe *u);
+
+#ifdef SS_OUT_OF_LINE
+extern void uprobe_resume_execution(struct uprobe_kimg *uk,
+			struct uprobe_task *utask, struct pt_regs *regs);
+extern int uprobe_prepare_singlestep(struct uprobe_kimg *uk,
+			struct uprobe_task *utask, struct pt_regs *regs);
+#endif
+
+#else	/* CONFIG_UPROBES */
+
+static inline int register_uprobe(struct uprobe *u)
+{
+	return -ENOSYS;
+}
+static inline void unregister_uprobe(struct uprobe *u)
+{
+}
+#endif	/* CONFIG_UPROBES */
+#endif	/* _LINUX_UPROBES_H */
diff -puN kernel/Makefile~1-uprobes-base kernel/Makefile
--- linux-2.6.21-rc3/kernel/Makefile~1-uprobes-base	2007-03-12 13:41:39.000000000 -0700
+++ linux-2.6.21-rc3-jimk/kernel/Makefile	2007-03-12 13:42:52.000000000 -0700
@@ -59,6 +59,7 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayac
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_UTRACE) += utrace.o
 obj-$(CONFIG_PTRACE) += ptrace.o
+obj-$(CONFIG_UPROBES) += uprobes.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -puN /dev/null kernel/uprobes.c
--- /dev/null	2007-03-12 08:02:26.505925865 -0700
+++ linux-2.6.21-rc3-jimk/kernel/uprobes.c	2007-03-12 13:42:52.000000000 -0700
@@ -0,0 +1,1383 @@
+/*
+ *  Userspace Probes (UProbes)
+ *  kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/rcupdate.h>
+#include <linux/err.h>
+#include <linux/utrace.h>
+#include <linux/uprobes.h>
+#include <linux/tracehook.h>
+#include <asm/tracehook.h>
+#include <asm/errno.h>
+
+#define SET_ENGINE_FLAGS	1
+#define RESET_ENGINE_FLAGS	0
+
+extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
+	void *buf, int len, int write);
+
+/*
+ * Locking hierarchy:
+ * uproc_mutex
+ * uprobe_process->mutex
+ * uprobe_task->mutex
+ * uprobe_kimg->rwsem
+ * E.g., don't unconditionally grab uprobe_process->mutex while holding
+ * uprobe_task->mutex.
+ */
+
+/* Table of currently probed processes, hashed by tgid. */
+static struct hlist_head uproc_table[UPROBE_TABLE_SIZE];
+
+/*
+ * Protects uproc_table during uprobe (un)registration, initiated
+ * either by user or internally.
+ */
+static DEFINE_MUTEX(uproc_mutex);
+
+/* p_uprobe_utrace_ops = &uprobe_utrace_ops.  Fwd refs are a pain w/o this. */
+static const struct utrace_engine_ops *p_uprobe_utrace_ops;
+
+/* Runs with the uproc_mutex held.  Returns with uproc->mutex held. */
+struct uprobe_process *uprobe_find_process(pid_t tgid)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct uprobe_process *uproc;
+
+	head = &uproc_table[hash_long(tgid, UPROBE_HASH_BITS)];
+	hlist_for_each_entry(uproc, node, head, hlist) {
+		if (uproc->tgid == tgid) {
+			mutex_lock(&uproc->mutex);
+			return uproc;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * In the given uproc's hash table of uprobes, find the one with the
+ * specified virtual address.
+ * If lock == 0, returns with uk unlocked.
+ * If lock == 1, returns with uk read-locked.
+ * If lock == 2, returns with uk write-locked.
+ * Runs with uproc->mutex held.
+ */
+struct uprobe_kimg *find_uprobe(struct uprobe_process *uproc,
+		unsigned long vaddr, int lock)
+{
+	struct uprobe_kimg *uk;
+	struct hlist_node *node;
+	struct hlist_head *head = &uproc->uprobe_table[hash_long(vaddr,
+		UPROBE_HASH_BITS)];
+
+	hlist_for_each_entry(uk, node, head, ut_node) {
+		if (uk->vaddr == vaddr && uk->state != UPROBE_FREEING
+				&& uk->state != UPROBE_DISABLED) {
+			if (lock == 1)
+				down_read(&uk->rwsem);
+			else if (lock == 2)
+				down_write(&uk->rwsem);
+			return uk;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * set_bp
+ * Set a breakpoint at the given vaddr. This routine is called
+ *	- From quiesce, after we have saved the original opcode at vaddr
+ *	- From the signal handler, to restore the bp after a uprobe hit
+ *	(but only if single-stepping inline).
+ * Returns BP_INSN_SIZE on success.
+ *
+ * NOTE: BREAKPOINT_INSTRUCTION on all archs is the same size as
+ * uprobe_opcode_t.
+ */
+static int set_bp(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+	uprobe_opcode_t bp_insn = BREAKPOINT_INSTRUCTION;
+	return access_process_vm(tsk, uk->vaddr, &bp_insn, BP_INSN_SIZE, 1);
+}
+
+/*
+ * set_orig_insn
+ * Set back vaddr to the original instruction. This routine is called
+ *	- From quiesce, during unregistration
+ *	- From the signal handler, so we can singlestep the instruction
+ *	(but only if single-stepping inline).
+ * Returns BP_INSN_SIZE on success.
+ */
+static int set_orig_insn(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+	return access_process_vm(tsk, uk->vaddr, &uk->opcode, BP_INSN_SIZE, 1);
+}
+
+static void bkpt_insertion_failed(struct uprobe_kimg *uk, const char *why)
+{
+	printk(KERN_ERR "Can't place uprobe at pid %d vaddr %#lx: %s\n",
+			uk->uproc->tgid, uk->vaddr, why);
+}
+
+/*
+ * Save a copy of the original instruction (so it can be single-stepped
+ * out of line), insert the breakpoint instruction, and awake
+ * register_uprobe().
+ * Runs with uk write-locked.
+ */
+static void insert_bkpt(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+	struct uprobe *u;
+	long result = 0;
+	int len;
+
+	if (!tsk) {
+		/* No surviving tasks associated with uk->uproc */
+		result = -ESRCH;
+		goto out;
+	}
+
+	/*
+	 * If access_process_vm() transfers fewer bytes than the maximum
+	 * instruction size, assume that the probed instruction is smaller
+	 * than the max and near the end of the last page of instructions.
+	 * But there must be room at least for a breakpoint-size instruction.
+	 */
+	len = access_process_vm(tsk, uk->vaddr, uk->insn,
+		       MAX_UINSN_SIZE * sizeof(uprobe_opcode_t), 0);
+	if (len < MAX_UINSN_SIZE) {
+		bkpt_insertion_failed(uk, "error reading original instruction");
+		result = -EIO;
+		goto out;
+	}
+	memcpy(&uk->opcode, uk->insn, BP_INSN_SIZE);
+	if (uk->opcode == BREAKPOINT_INSTRUCTION) {
+		bkpt_insertion_failed(uk, "bkpt already exists at that addr");
+		result = -EEXIST;
+		goto out;
+	}
+
+	if ((result = arch_validate_probed_insn(uk)) < 0) {
+		bkpt_insertion_failed(uk, "instruction type cannot be probed");
+		goto out;
+	}
+
+	len = set_bp(uk, tsk);
+	if (len < BP_INSN_SIZE) {
+		bkpt_insertion_failed(uk, "failed to insert bkpt instruction");
+		result = -EIO;
+		goto out;
+	}
+out:
+	uk->state = (result ? UPROBE_DISABLED : UPROBE_BP_SET);
+	list_for_each_entry(u, &uk->uprobe_list, list)
+		u->status = result;
+	wake_up_all(&uk->waitq);
+}
+
+/* Runs with uk write-locked. */
+static void remove_bkpt(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+	int len;
+
+	if (tsk) {
+		len = set_orig_insn(uk, tsk);
+		if (len < BP_INSN_SIZE) {
+			printk(KERN_ERR
+				"Error removing uprobe at pid %d vaddr %#lx:"
+				" can't restore original instruction\n",
+				tsk->tgid, uk->vaddr);
+			/*
+			 * This shouldn't happen, since we were previously
+			 * able to write the breakpoint at that address.
+			 * There's not much we can do besides let the
+			 * process die with a SIGTRAP the next time the
+			 * breakpoint is hit.
+			 */
+		}
+	}
+	/* Wake up unregister_uprobe(). */
+	uk->state = UPROBE_DISABLED;
+	wake_up_all(&uk->waitq);
+}
+
+/*
+ * Runs with all of uproc's threads quiesced and uproc->mutex held.
+ * As specified, insert or remove the breakpoint instruction for each
+ * uprobe_kimg on uproc's pending list.
+ * tsk = one of the tasks associated with uproc.  Could be NULL.
+ * It's OK for uproc->pending_uprobes to be empty here.  It can happen
+ * if a register and an unregister are requested (by different probers)
+ * simultaneously for the same pid/vaddr.
+ */
+static void handle_pending_uprobes(struct uprobe_process *uproc,
+	struct task_struct *tsk)
+{
+	struct uprobe_kimg *uk, *tmp;
+
+	list_for_each_entry_safe(uk, tmp, &uproc->pending_uprobes, pd_node) {
+		down_write(&uk->rwsem);
+		switch (uk->state) {
+		case UPROBE_INSERTING:
+			insert_bkpt(uk, tsk);
+			break;
+		case UPROBE_REMOVING:
+			remove_bkpt(uk, tsk);
+			break;
+		default:
+			BUG();
+		}
+		list_del(&uk->pd_node);
+		up_write(&uk->rwsem);
+	}
+}
+
+static void utask_adjust_flags(struct uprobe_task *utask, int set,
+	unsigned long flags)
+{
+	unsigned long newflags, oldflags;
+
+	newflags = oldflags = utask->engine->flags;
+
+	if (set)
+		newflags |= flags;
+	else
+		newflags &= ~flags;
+
+	if (newflags != oldflags)
+		utrace_set_flags(utask->tsk, utask->engine, newflags);
+}
+
+/* Opposite of quiesce_all_threads().  Same locking applies. */
+static void rouse_all_threads(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		mutex_lock(&utask->mutex);
+		if (utask->quiescing) {
+			utask->quiescing = 0;
+			if (utask->state == UPTASK_QUIESCENT) {
+				utask_adjust_flags(utask, RESET_ENGINE_FLAGS,
+					UTRACE_ACTION_QUIESCE |
+					UTRACE_EVENT(QUIESCE));
+				utask->state = UPTASK_RUNNING;
+				uproc->n_quiescent_threads--;
+			}
+		}
+		mutex_unlock(&utask->mutex);
+	}
+	/* Wake any threads that decided to sleep rather than quiesce. */
+	wake_up_all(&uproc->waitq);
+}
+
+/*
+ * If all of uproc's surviving threads have quiesced, do the necessary
+ * breakpoint insertions or removals and then un-quiesce everybody.
+ * tsk is a surviving thread, or NULL if there is none.  Runs with
+ * uproc->mutex held.
+ */
+static void check_uproc_quiesced(struct uprobe_process *uproc,
+		struct task_struct *tsk)
+{
+	if (uproc->n_quiescent_threads >= uproc->nthreads) {
+		handle_pending_uprobes(uproc, tsk);
+		rouse_all_threads(uproc);
+	}
+}
+
+/*
+ * Quiesce all threads in the specified process -- e.g., prior to
+ * breakpoint insertion.  Runs with uproc->mutex held.
+ * Returns the number of threads that haven't died yet.
+ */
+static int quiesce_all_threads(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+	struct task_struct *survivor = NULL;	// any survivor
+	int survivors = 0;
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		mutex_lock(&utask->mutex);
+		survivor = utask->tsk;
+		survivors++;
+		if (!utask->quiescing) {
+			/*
+			 * If utask is currently handling a probepoint, it'll
+			 * check utask->quiescing and quiesce when it's done.
+			 */
+			utask->quiescing = 1;
+			if (utask->state == UPTASK_RUNNING) {
+				utask->quiesce_master = current;
+				utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+					UTRACE_ACTION_QUIESCE
+					| UTRACE_EVENT(QUIESCE));
+				utask->quiesce_master = NULL;
+			}
+		}
+		mutex_unlock(&utask->mutex);
+	}
+	/*
+	 * If any task was already quiesced (in utrace's opinion) when we
+	 * called utask_adjust_flags() on it, uprobe_report_quiesce() was
+	 * called, but wasn't in a position to call check_uproc_quiesced().
+	 */
+	check_uproc_quiesced(uproc, survivor);
+	return survivors;
+}
+
+/* Runs with uproc_mutex and uproc->mutex held. */
+static void uprobe_free_process(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask, *tmp;
+
+	if (!hlist_unhashed(&uproc->hlist))
+		hlist_del(&uproc->hlist);
+
+	list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) {
+		/* Give any last report_* callback a chance to complete. */
+		mutex_lock(&utask->mutex);
+		/*
+		 * utrace_detach() is OK here (required, it seems) even if
+		 * utask->tsk == current and we're in a utrace callback.
+		 */
+		if (utask->engine)
+			utrace_detach(utask->tsk, utask->engine);
+		mutex_unlock(&utask->mutex);
+		kfree(utask);
+	}
+
+	mutex_unlock(&uproc->mutex);	// So kfree doesn't complain
+	kfree(uproc);
+}
+
+/*
+ * Free up the uprobe_process unless one or more probepoints remain,
+ * one or more threads have deferred registrations pending,
+ * or somebody (e.g., unregister_uprobe() or utask_quiesce_in_callback())
+ * is still awaiting removal of a breakpoint.
+ *
+ * Returns 1 if we freed uproc, or 0 otherwise.
+ *
+ * Called with uproc_mutex and uproc->mutex held.  Returns with
+ * uproc_mutex held.
+ */
+static int uprobe_maybe_free_process(struct uprobe_process *uproc)
+{
+	int sleeper;
+	struct uprobe_task *utask;
+
+	if (uproc->nuk > 0)
+		goto cant_free;
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		mutex_lock(&utask->mutex);
+		sleeper = (utask->state == UPTASK_SLEEPING);
+		mutex_unlock(&utask->mutex);
+		if (sleeper)
+			goto cant_free;
+	}
+	uprobe_free_process(uproc);
+	return 1;
+
+cant_free:
+	mutex_unlock(&uproc->mutex);
+	return 0;
+}
+
+/*
+ * Allocate a uprobe_task object for t and add it to uproc's list.
+ * Called with t "got" and uproc->mutex locked.  Called in one of
+ * the following cases:
+ * - before setting the first uprobe in t's process
+ * - we're in uprobe_report_clone() and t is the newly added thread
+ * Returns:
+ * - pointer to new uprobe_task on success
+ * - NULL if t dies before we can utrace_attach it
+ * - negative errno otherwise
+ */
+static struct uprobe_task *uprobe_add_task(struct task_struct *t,
+		struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+	struct utrace_attached_engine *engine;
+
+	utask = (struct uprobe_task *)kzalloc(sizeof *utask, GFP_USER);
+	if (unlikely(utask == NULL))
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&utask->mutex);
+	mutex_lock(&utask->mutex);
+	utask->tsk = t;
+	utask->state = UPTASK_RUNNING;
+	utask->quiescing = 0;
+	utask->uproc = uproc;
+	utask->active_probe = NULL;
+
+	engine = utrace_attach(t, UTRACE_ATTACH_CREATE, p_uprobe_utrace_ops,
+		(unsigned long)utask);
+	if (IS_ERR(engine)) {
+		long err = PTR_ERR(engine);
+		printk("uprobes: utrace_attach failed, returned %ld\n", err);
+		mutex_unlock(&utask->mutex);
+		kfree(utask);
+		if (err == -ESRCH)
+			 return NULL;
+		return ERR_PTR(err);
+	}
+	utask->engine = engine;
+	/*
+	 * Always watch for traps, clones, execs and exits. Caller must
+	 * set any other engine flags.
+	 */
+	utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+			UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(EXEC) |
+			UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT));
+	INIT_LIST_HEAD(&utask->list);
+	list_add_tail(&utask->list, &uproc->thread_list);
+	/*
+	 * Note that it's OK if t dies just after utrace_attach, because
+	 * with the engine in place, the appropriate report_* callback
+	 * should handle it after we release uprobe->mutex.
+	 */
+	mutex_unlock(&utask->mutex);
+	return utask;
+}
+
+/* See comment in uprobe_mk_process(). */
+static struct task_struct *find_next_thread_to_add(struct uprobe_process *uproc,		struct task_struct *start)
+{
+	struct task_struct *t;
+	struct uprobe_task *utask;
+
+	read_lock(&tasklist_lock);
+	t = start;
+	do {
+		list_for_each_entry(utask, &uproc->thread_list, list) {
+			if (utask->tsk == t)
+				goto t_already_added;
+		}
+		/* Found thread/task to add. */
+		get_task_struct(t);  // OK to nest if t=p, right?
+		read_unlock(&tasklist_lock);
+		return t;
+t_already_added:
+		t = next_thread(t);
+	} while (t != start);
+
+	read_unlock(&tasklist_lock);
+	return NULL;
+}
+
+/* Runs with uproc_mutex held; returns with uproc->mutex held. */
+static struct uprobe_process *uprobe_mk_process(struct task_struct *p)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_task *utask;
+	struct task_struct *add_me;
+	int i;
+	long err;
+
+	uproc = (struct uprobe_process *)kzalloc(sizeof *uproc, GFP_USER);
+	if (unlikely(uproc == NULL))
+		return ERR_PTR(-ENOMEM);
+
+	/* Initialize fields */
+	mutex_init(&uproc->mutex);
+	mutex_lock(&uproc->mutex);
+	init_waitqueue_head(&uproc->waitq);
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+		INIT_HLIST_HEAD(&uproc->uprobe_table[i]);
+	uproc->nuk = 0;
+	INIT_LIST_HEAD(&uproc->pending_uprobes);
+	INIT_LIST_HEAD(&uproc->thread_list);
+	uproc->nthreads = 0;
+	uproc->n_quiescent_threads = 0;
+	INIT_HLIST_NODE(&uproc->hlist);
+	uproc->tgid = p->tgid;
+
+	/*
+	 * Create and populate one utask per thread in this process.  We
+	 * can't call uprobe_add_task() while holding tasklist_lock, so we:
+	 *	1. Lock task list.
+	 *	2. Find the next task, add_me, in this process that's not
+	 *	already on uproc's thread_list.  (Start search at previous
+	 *	one found.)
+	 *	3. Unlock task list.
+	 *	4. uprobe_add_task(add_me, uproc)
+	 *	Repeat 1-4 'til we have utasks for all tasks.
+	 */
+	add_me = p;
+	while ((add_me = find_next_thread_to_add(uproc, add_me)) != NULL) {
+		utask = uprobe_add_task(add_me, uproc);
+		put_task_struct(add_me);
+		if (IS_ERR(utask)) {
+			err = PTR_ERR(utask);
+			goto fail;
+		}
+		if (utask)
+			uproc->nthreads++;
+	}
+
+	if (uproc->nthreads == 0) {
+		/* All threads -- even p -- are dead. */
+		err = -ESRCH;
+		goto fail;
+	}
+	return uproc;
+
+fail:
+	uprobe_free_process(uproc);
+	return ERR_PTR(err);
+}
+
+/*
+ * Creates a uprobe_kimg and connects it to u and uproc.
+ * Runs with uproc->mutex held.  Returns with uprobe_kimg unlocked.
+ */
+static struct uprobe_kimg *uprobe_add_kimg(struct uprobe *u,
+	struct uprobe_process *uproc)
+{
+	struct uprobe_kimg *uk;
+
+	uk = (struct uprobe_kimg *)kzalloc(sizeof *uk, GFP_USER);
+	if (unlikely(uk == NULL))
+		return ERR_PTR(-ENOMEM);
+	init_rwsem(&uk->rwsem);
+	down_write(&uk->rwsem);
+	init_waitqueue_head(&uk->waitq);
+
+	/* Connect to u. */
+	INIT_LIST_HEAD(&uk->uprobe_list);
+	list_add_tail(&u->list, &uk->uprobe_list);
+	u->uk = uk;
+	u->status = -EBUSY;
+	uk->vaddr = u->vaddr;
+
+	/* Connect to uproc. */
+	uk->state = UPROBE_INSERTING;
+	uk->uproc = uproc;
+	INIT_LIST_HEAD(&uk->pd_node);
+	list_add_tail(&uk->pd_node, &uproc->pending_uprobes);
+	INIT_HLIST_NODE(&uk->ut_node);
+	hlist_add_head(&uk->ut_node,
+		&uproc->uprobe_table[hash_long(uk->vaddr, UPROBE_HASH_BITS)]);
+	uproc->nuk++;
+	up_write(&uk->rwsem);
+	return uk;
+}
+
+/*
+ * Called with uk write-locked.  uk->uproc may also be freed if this
+ * is the last uk.
+ */
+static void uprobe_free_kimg(struct uprobe_kimg *uk)
+{
+	struct uprobe_process *uproc = uk->uproc;
+
+	/* Come down through the top to preserve lock ordering. */
+	uk->state = UPROBE_FREEING;
+	up_write(&uk->rwsem);
+
+	mutex_lock(&uproc_mutex);
+	mutex_lock(&uproc->mutex);
+	down_write(&uk->rwsem);	// So other CPUs have time to see UPROBE_FREEING
+	hlist_del(&uk->ut_node);
+	up_write(&uk->rwsem);	// So kfree doesn't complain
+	kfree(uk);
+	uproc->nuk--;
+	if (uproc->nuk <= 0)
+		uprobe_maybe_free_process(uproc);
+	else
+		mutex_unlock(&uproc->mutex);
+	mutex_unlock(&uproc_mutex);
+}
+
+/* Note that we never free u, because the user owns that. */
+static void purge_uprobe(struct uprobe *u)
+{
+	struct uprobe_kimg *uk;
+
+	uk = u->uk;
+	down_write(&uk->rwsem);
+	list_del(&u->list);
+	u->uk = NULL;
+	if (list_empty(&uk->uprobe_list)) {
+		uprobe_free_kimg(uk);
+		return;
+	}
+	up_write(&uk->rwsem);
+}
+
+/*
+ * See Documentation/uprobes.txt.
+ */
+int register_uprobe(struct uprobe *u)
+{
+	struct task_struct *p;
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	int survivors, ret = 0, uproc_is_new = 0;
+/* We should be able to access atleast a bkpt-size insn at u->addr */
+#define NBYTES_TO_TEST BP_INSN_SIZE
+	char buf[NBYTES_TO_TEST];
+
+	if (!u || !u->handler)
+		return -EINVAL;
+	if (u->uk && u->status == -EBUSY)
+		/* Looks like register or unregister is already in progress. */
+		return -EAGAIN;
+	u->uk = NULL;
+
+	rcu_read_lock();
+	p = find_task_by_pid(u->pid);
+	if (p)
+		get_task_struct(p);
+	rcu_read_unlock();
+
+	if (!p)
+		return -ESRCH;
+	u->tgid = p->tgid;
+
+	/* Exit early if vaddr is bad -- i.e., we can't even read from it. */
+	if (access_process_vm(p, u->vaddr, buf, NBYTES_TO_TEST, 0)
+			!= NBYTES_TO_TEST) {
+		ret = -EINVAL;
+		goto fail_tsk;
+	}
+
+	/* Get the uprobe_process for this pid, or make a new one. */
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(p->tgid);
+
+	if (uproc)
+		mutex_unlock(&uproc_mutex);
+	else {
+		uproc = uprobe_mk_process(p);
+		if (IS_ERR(uproc)) {
+			ret = (int) PTR_ERR(uproc);
+			mutex_unlock(&uproc_mutex);
+			goto fail_tsk;
+		}
+		/* Hold uproc_mutex until we've added uproc to uproc_table. */
+		uproc_is_new = 1;
+	}
+
+	INIT_LIST_HEAD(&u->list);
+
+	/* See if we already have a uprobe at the vaddr. */
+	uk = (uproc_is_new ? NULL : find_uprobe(uproc, u->vaddr, 2));
+	if (uk) {
+		/* uk is write-locked. */
+		/* Breakpoint is already in place, or soon will be. */
+		u->uk = uk;
+		list_add_tail(&u->list, &uk->uprobe_list);
+		switch (uk->state) {
+		case UPROBE_INSERTING:
+			u->status = -EBUSY;
+			break;
+		case UPROBE_REMOVING:
+			/* Wait!  Don't remove that bkpt after all! */
+			uk->state = UPROBE_BP_SET;
+			list_del(&uk->pd_node);	// Remove from pending list.
+			wake_up_all(&uk->waitq);// Wake unregister_uprobe().
+			/*FALLTHROUGH*/
+		case UPROBE_BP_SET:
+			u->status = 0;
+			break;
+		default:
+			BUG();
+		}
+		up_write(&uk->rwsem);
+		mutex_unlock(&uproc->mutex);
+		put_task_struct(p);
+		if (u->status == 0)
+			return 0;
+		goto await_bkpt_insertion;
+	} else {
+		uk = uprobe_add_kimg(u, uproc);
+		if (IS_ERR(uk)) {
+			ret = (int) PTR_ERR(uk);
+			goto fail_uproc;
+		}
+	}
+
+	if (uproc_is_new) {
+		hlist_add_head(&uproc->hlist,
+			&uproc_table[hash_long(uproc->tgid, UPROBE_HASH_BITS)]);
+		mutex_unlock(&uproc_mutex);
+	}
+	put_task_struct(p);
+	survivors = quiesce_all_threads(uproc);
+	mutex_unlock(&uproc->mutex);
+
+	if (survivors == 0) {
+		purge_uprobe(u);
+		return -ESRCH;
+	}
+
+await_bkpt_insertion:
+	wait_event(uk->waitq, uk->state != UPROBE_INSERTING);
+	ret = u->status;
+	if (ret != 0)
+		purge_uprobe(u);
+	return ret;
+
+fail_uproc:
+	if (uproc_is_new) {
+		uprobe_free_process(uproc);
+		mutex_unlock(&uproc_mutex);
+	}
+
+fail_tsk:
+	put_task_struct(p);
+	return ret;
+}
+
+void unregister_uprobe(struct uprobe *u)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	int survivors;
+
+	if (!u)
+		return;
+
+	if (!u->uk)
+		/*
+		 * This probe was never successfully registered, or
+		 * has already been unregistered.
+		 */
+		return;
+
+	if (u->status == -EBUSY)
+		/* Looks like register or unregister is already in progress. */
+		return;
+
+	/* As with unregister_kprobe, assume that u points to a valid probe. */
+	/* Grab the global mutex to ensure that uproc doesn't disappear. */
+	mutex_lock(&uproc_mutex);
+	uk = u->uk;
+	uproc = uk->uproc;
+	mutex_lock(&uproc->mutex);
+	down_write(&uk->rwsem);
+	mutex_unlock(&uproc_mutex);
+
+	list_del(&u->list);
+	u->uk = NULL;
+	if (!list_empty(&uk->uprobe_list)) {
+		up_write(&uk->rwsem);
+		mutex_unlock(&uproc->mutex);
+		return;
+	}
+
+	/*
+	 * The last uprobe at uk's probepoint is being unregistered.
+	 * Queue the breakpoint for removal.
+	 */
+	uk->state = UPROBE_REMOVING;
+	list_add_tail(&uk->pd_node, &uproc->pending_uprobes);
+	up_write(&uk->rwsem);
+
+	survivors = quiesce_all_threads(uproc);
+	mutex_unlock(&uproc->mutex);
+	if (survivors)
+		wait_event(uk->waitq, uk->state != UPROBE_REMOVING);
+
+	down_write(&uk->rwsem);
+	if (likely(uk->state == UPROBE_DISABLED))
+		uprobe_free_kimg(uk);
+	else
+		/* Somebody else's register_uprobe() resurrected uk. */
+		up_write(&uk->rwsem);
+}
+
+/*
+ * utrace engine report callbacks
+ */
+
+/*
+ * We are in a utrace callback for utask, and we've been asked to quiesce.
+ * Somebody requested our quiescence and we hit a probepoint first.
+ * We'd like to just set the UTRACE_ACTION_QUIESCE and
+ * UTRACE_EVENT(QUIESCE) flags and coast into quiescence.  Unfortunately,
+ * it's possible to hit a probepoint again before we quiesce.  When
+ * processing the SIGTRAP, utrace would call uprobe_report_quiesce(),
+ * which must decline to take any action so as to avoid removing the
+ * uprobe just hit.  As a result, however, we could keep hitting breakpoints
+ * and never quiescing.
+ *
+ * So here we do essentially what we'd prefer to do in uprobe_report_quiesce().
+ * If we're the last thread to quiesce, handle_pending_uprobes() and
+ * rouse_all_threads().  Otherwise, pretend we're quiescent and sleep until
+ * the last quiescent thread handles that stuff and then wakes us.
+ *
+ * Called and returns with no mutexes held.  Returns 1 if we free utask->uproc,
+ * else 0.
+ */
+static int utask_quiesce_in_callback(struct uprobe_task *utask)
+{
+	struct uprobe_process *uproc = utask->uproc;
+	enum uprobe_task_state prev_state = utask->state;
+
+	mutex_lock(&uproc->mutex);
+	if (uproc->n_quiescent_threads == uproc->nthreads-1) {
+		/* We're the last thread to "quiesce." */
+		handle_pending_uprobes(uproc, utask->tsk);
+		rouse_all_threads(uproc);
+		mutex_unlock(&uproc->mutex);
+		return 0;
+	} else {
+		int uproc_freed;
+
+		mutex_lock(&utask->mutex);
+		utask->state = UPTASK_SLEEPING;
+		mutex_unlock(&utask->mutex);
+		uproc->n_quiescent_threads++;
+		mutex_unlock(&uproc->mutex);
+
+		wait_event(uproc->waitq, !utask->quiescing);
+
+		/*
+		 * Note that it's possible for quiesce_all_threads() and
+		 * even rouse_all_threads() to run again while we go for
+		 * these mutexes.  No need to treat that case specially;
+		 * we may hit a probepoint again before quiescing (or
+		 * sleeping), but that's OK.
+		 */
+		mutex_lock(&uproc_mutex);
+		mutex_lock(&uproc->mutex);
+
+		mutex_lock(&utask->mutex);
+		utask->state = prev_state;
+		uproc->n_quiescent_threads--;
+		mutex_unlock(&utask->mutex);
+
+		/*
+		 * If uproc's last uprobe has been unregistered, and
+		 * unregister_uprobe() woke up before we did, it's up
+		 * to us to free uproc.
+		 */
+		uproc_freed = uprobe_maybe_free_process(uproc);
+		mutex_unlock(&uproc_mutex);
+		return uproc_freed;
+	}
+}
+
+/*
+ * Signal callback:
+ *
+ * We get called here with:
+ *	state = UPTASK_RUNNING => we are here due to a breakpoint hit
+ *		- Figure out which probepoint, based on regs->IP
+ *		- Set state = UPTASK_BP_HIT
+ *		- Reset regs->IP to beginning of the insn, if necessary
+ *		- Invoke handler for each uprobe at this probepoint
+ *		- Set state = UPTASK_SSTEP_AFTER_BP
+ *		- Set singlestep in motion (UTRACE_ACTION_SINGLESTEP)
+ *
+ *	state = UPTASK_SSTEP_AFTER_BP => here after singlestepping
+ *		- Validate we are here per the state machine
+ *		- Clean up after singlestepping
+ *		- Set state = UPTASK_RUNNING
+ *		- If it's time to quiesce, take appropriate action.
+ *
+ *	state = ANY OTHER STATE
+ *		- Not our signal, pass it on (UTRACE_ACTION_RESUME)
+ */
+static u32 uprobe_report_signal(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, struct pt_regs *regs, u32 action,
+		siginfo_t *info, const struct k_sigaction *orig_ka,
+		struct k_sigaction *return_ka)
+{
+	struct uprobe_task *utask;
+	struct uprobe_kimg *uk;
+	struct uprobe_process *uproc;
+	struct uprobe *u;
+#ifndef SS_OUT_OF_LINE
+	int len;
+#endif
+	u32 ret;
+	unsigned long probept;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	BUG_ON(!utask);
+
+	if (action != UTRACE_SIGNAL_CORE || info->si_signo != SIGTRAP)
+		goto no_interest;
+
+	mutex_lock(&utask->mutex);
+	switch (utask->state) {
+	case UPTASK_RUNNING:
+		uproc = utask->uproc;
+		probept = arch_get_probept(regs);
+
+		/* Honor locking hierarchy: uproc before utask. */
+		if (!mutex_trylock(&uproc->mutex)) {
+			mutex_unlock(&utask->mutex);
+			mutex_lock(&uproc->mutex);
+			mutex_lock(&utask->mutex);
+			/*
+			 * Note that we don't need to recheck utask->state,
+			 * because only the task itself can change its state
+			 * from UPTASK_RUNNING to something else.
+			 */
+		}
+
+		uk = find_uprobe(uproc, probept, 1);
+		mutex_unlock(&uproc->mutex);
+		if (!uk) {
+			mutex_unlock(&utask->mutex);
+			goto no_interest;
+		}
+		utask->active_probe = uk;
+		utask->state = UPTASK_BP_HIT;
+
+		if (likely(uk->state == UPROBE_BP_SET)) {
+			list_for_each_entry(u, &uk->uprobe_list, list) {
+				if (u->handler)
+					u->handler(u, regs);
+			}
+		}
+		up_read(&uk->rwsem);
+
+#ifdef SS_OUT_OF_LINE
+		ret = uprobe_prepare_singlestep(uk, utask, regs);
+		BUG_ON(ret);
+#else
+		arch_reset_ip_for_sstep(regs);
+		len = set_orig_insn(uk, tsk);
+		if (!len) {
+			printk("Failed to temporarily restore original "
+				"instruction for single-stepping: "
+				"pid/tgid=%d/%d, vaddr=%#lx\n",
+				tsk->pid, tsk->tgid, uk->vaddr);
+			// FIXME: Locking problems?
+			do_exit(SIGSEGV);
+		}
+#endif
+		utask->state = UPTASK_SSTEP_AFTER_BP;
+		mutex_unlock(&utask->mutex);
+		/*
+		 * No other engines must see this signal, and the
+		 * signal shouldn't be passed on either.
+		 */
+		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN |
+			UTRACE_ACTION_SINGLESTEP | UTRACE_ACTION_NEWSTATE;
+		break;
+	case UPTASK_SSTEP_AFTER_BP:
+		uk = utask->active_probe;
+		BUG_ON(!uk);
+#ifndef SS_OUT_OF_LINE
+		len = set_bp(uk, tsk);
+		if (!len) {
+			printk("Couldn't restore bp: pid/tgid=%d/%d, addr=%#lx\n",
+				tsk->pid, tsk->tgid, uk->vaddr);
+			uk->state = UPROBE_DISABLED;
+		}
+#else
+		uprobe_resume_execution(uk, utask, regs);
+#endif
+
+		utask->active_probe = NULL;
+		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
+			| UTRACE_ACTION_NEWSTATE;
+		utask->state = UPTASK_RUNNING;
+		if (utask->quiescing) {
+			mutex_unlock(&utask->mutex);
+			if (utask_quiesce_in_callback(utask) == 1)
+				ret |= UTRACE_ACTION_DETACH;
+		} else
+			mutex_unlock(&utask->mutex);
+
+		break;
+	default:
+		mutex_unlock(&utask->mutex);
+		goto no_interest;
+	}
+	return ret;
+
+no_interest:
+	return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * utask_quiesce_pending_sigtrap: The utask entered the quiesce callback
+ * through the signal delivery path, apparently. Check if the associated
+ * signal happened due to a uprobe hit.
+ *
+ * Called with utask->mutex and utask->uproc->mutex held.  Returns 0 if
+ * quiesce was entered with SIGTRAP pending due to a uprobe hit.
+ */
+static int utask_quiesce_pending_sigtrap(struct uprobe_task *utask)
+{
+	const struct utrace_regset_view *view;
+	const struct utrace_regset *regset;
+	struct uprobe_kimg *uk;
+	unsigned long inst_ptr;
+
+	if (utask->active_probe)
+		/* Signal must be the post-single-step trap. */
+		return 0;
+
+	view = utrace_native_view(utask->tsk);
+	regset = utrace_regset(utask->tsk, utask->engine, view, 0);
+	if (unlikely(regset == NULL))
+		return -EIO;
+
+	if ((*regset->get)(utask->tsk, regset, SLOT_IP * regset->size,
+			regset->size, &inst_ptr, NULL) != 0)
+		return -EIO;
+
+	uk = find_uprobe(utask->uproc, ARCH_BP_INST_PTR(inst_ptr), 0);
+	return (uk == NULL);
+}
+
+/*
+ * Quiesce callback: The associated process has one or more breakpoint
+ * insertions or removals pending.  If we're the last thread in this
+ * process to quiesce, do the insertion(s) and/or removal(s).
+ */
+static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
+		struct task_struct *tsk)
+{
+	struct uprobe_task *utask;
+	struct uprobe_process *uproc;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	BUG_ON(!utask);
+	if (current == utask->quiesce_master) {
+		/*
+		 * tsk was already quiescent when quiesce_all_threads()
+		 * called utrace_set_flags(), which in turned called
+		 * here.  uproc and utask are already locked.  Do as
+		 * little as possible and get out.
+		 */
+		utask->state = UPTASK_QUIESCENT;
+		utask->uproc->n_quiescent_threads++;
+		return UTRACE_ACTION_RESUME;
+	}
+
+	/* utask_quiesce_pending_sigtrap() needs uproc to be locked. */
+	uproc = utask->uproc;
+	mutex_lock(&uproc->mutex);
+	mutex_lock(&utask->mutex);
+
+	/*
+	 * Do nothing if we entered here as a consequence of a utrace
+	 * state transition requirement to pass through quiesce (eg.,
+	 * UTRACE_ACTION_SINGLESTEP set) or if the task entered legitimately,
+	 * but has a pending SIGTRAP corresponding to a uprobe.  (We must
+	 * let uprobe_report_signal() handle the uprobe hit and THEN
+	 * quiesce, because (a) there's a chance that we're quiescing
+	 * in order to remove that very uprobe, and (b) there's a tiny
+	 * chance that even though that uprobe isn't marked for removal
+	 * now, it may be before all threads manage to quiesce.)
+	 */
+	if (!utask->quiescing) {
+		mutex_unlock(&utask->mutex);
+		goto done;
+	}
+
+	if (utask_quiesce_pending_sigtrap(utask) == 0) {
+		utask_adjust_flags(utask, RESET_ENGINE_FLAGS,
+				UTRACE_ACTION_QUIESCE | UTRACE_EVENT(QUIESCE));
+		mutex_unlock(&utask->mutex);
+		goto done;
+	}
+
+	utask->state = UPTASK_QUIESCENT;
+	mutex_unlock(&utask->mutex);
+
+	uproc->n_quiescent_threads++;
+	check_uproc_quiesced(uproc, tsk);
+done:
+	mutex_unlock(&uproc->mutex);
+	return UTRACE_ACTION_RESUME;
+}
+
+/* Runs with uproc->mutex held. */
+static struct task_struct *find_surviving_thread(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+
+	list_for_each_entry(utask, &uproc->thread_list, list)
+		return utask->tsk;
+	return NULL;
+}
+
+/*
+ * uproc's process is exiting or exec-ing, so zap all the (now irrelevant)
+ * probepoints.  Runs with uproc->mutex held.
+ */
+void uprobe_cleanup_process(struct uprobe_process *uproc)
+{
+	int i;
+	struct uprobe_kimg *uk;
+	struct hlist_node *node, *t1;
+	struct hlist_head *head;
+	struct uprobe *u, *t2;
+
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+		head = &uproc->uprobe_table[i];
+		hlist_for_each_entry_safe(uk, node, t1, head, ut_node) {
+			down_write(&uk->rwsem);
+			if (uk->state == UPROBE_INSERTING ||
+					uk->state == UPROBE_REMOVING) {
+				/*
+				 * This task is (exec/exit)ing with
+				 * a [un]register_uprobe pending.
+				 * [un]register_uprobe will free uk
+				 */
+				uk->state = UPROBE_DISABLED;
+				list_for_each_entry_safe(u, t2,
+					       &uk->uprobe_list, list)
+					u->status = -ESRCH;
+				up_write(&uk->rwsem);
+				wake_up_all(&uk->waitq);
+			} else if (uk->state == UPROBE_BP_SET) {
+				hlist_del(&uk->ut_node);
+				uproc->nuk--;
+				list_for_each_entry_safe(u, t2,
+					       &uk->uprobe_list, list) {
+					u->status = -ESRCH;
+					u->uk = NULL;
+					list_del(&u->list);
+				}
+				up_write(&uk->rwsem);
+				kfree(uk);
+			} else {
+				/*
+				 * If uk is UPROBE_DISABLED, assume that
+				 * [un]register_uprobe() has been notified
+				 * and will free it soon.
+				 */
+				up_write(&uk->rwsem);
+			}
+		}
+	}
+}
+
+/*
+ * Exit callback: The associated task/thread is exiting.
+ */
+static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, long orig_code, long *code)
+{
+	struct uprobe_task *utask;
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	int utask_quiescing;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+
+	uk = utask->active_probe;
+	if (uk) {
+		printk(KERN_WARNING "Task died in uprobe handler:"
+			"  pid/tgid = %d/%d, probepoint = %#lx\n",
+			tsk->pid, tsk->tgid, uk->vaddr);
+		if (utask->state == UPTASK_BP_HIT)
+			up_read(&uk->rwsem);
+		mutex_unlock(&utask->mutex);
+	}
+
+	uproc = utask->uproc;
+	mutex_lock(&uproc_mutex);
+	mutex_lock(&uproc->mutex);
+	utask_quiescing = utask->quiescing;
+
+	list_del(&utask->list);
+	kfree(utask);
+
+	uproc->nthreads--;
+	if (uproc->nthreads) {
+		if (utask_quiescing)
+			/*
+			 * In case other threads are waiting for
+			 * us to quiesce...
+			 */
+			check_uproc_quiesced(uproc,
+				       find_surviving_thread(uproc));
+		mutex_unlock(&uproc->mutex);
+	} else {
+		/*
+		 * We were the last remaining thread - clean up the uprobe
+		 * remnants a-la unregister_uprobe(). We don't have to
+		 * remove the breakpoints though.
+		 */
+		uprobe_cleanup_process(uproc);
+		if (uproc->nuk <= 0)
+			uprobe_free_process(uproc);
+		else
+			/* Let [un]register_uprobe() clean up. */
+			mutex_unlock(&uproc->mutex);
+	}
+
+	mutex_unlock(&uproc_mutex);
+	return UTRACE_ACTION_DETACH;
+}
+
+/*
+ * Clone callback: The associated process spawned a thread/process
+ *
+ * NOTE: For now, we don't pass on uprobes from the parent to the
+ * child. We now do the necessary clearing of breakpoints in the
+ * child's address space.
+ *
+ * TODO:
+ *	- Provide option for child to inherit uprobes.
+ */
+static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
+		struct task_struct *parent, unsigned long clone_flags,
+		struct task_struct *child)
+{
+	int len;
+	struct uprobe_process *uproc;
+	struct uprobe_task *ptask, *ctask;
+
+	ptask = rcu_dereference((struct uprobe_task *)engine->data);
+	uproc = ptask->uproc;
+
+	/*
+	 * Lock uproc so no new uprobes can be installed till all
+	 * report_clone activities are completed
+	 */
+	mutex_lock(&uproc->mutex);
+	get_task_struct(child);
+
+	if (clone_flags & CLONE_THREAD) {
+		/* New thread in the same process */
+		ctask = uprobe_add_task(child, uproc);
+		BUG_ON(!ctask);
+		if (IS_ERR(ctask)) {
+			put_task_struct(child);
+			mutex_unlock(&uproc->mutex);
+			goto fail;
+		}
+		if (ctask)
+			uproc->nthreads++;
+		/*
+		 * FIXME: Handle the case where uproc is quiescing
+		 * (assuming it's possible to clone while quiescing).
+		 */
+	} else {
+		/*
+		 * New process spawned by parent.  Remove the probepoints
+		 * in the child's text.
+		 *
+		 * Its not necessary to quiesce the child as we are assured
+		 * by utrace that this callback happens *before* the child
+		 * gets to run userspace.
+		 *
+		 * We also hold the uproc->mutex for the parent - so no
+		 * new uprobes will be registered till we return.
+		 */
+		int i;
+		struct uprobe_kimg *uk;
+		struct hlist_node *node;
+		struct hlist_head *head;
+
+		for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+			head = &uproc->uprobe_table[i];
+			hlist_for_each_entry(uk, node, head, ut_node) {
+				down_write(&uk->rwsem);
+				len = set_orig_insn(uk, child);
+				BUG_ON(len != BP_INSN_SIZE);
+				up_write(&uk->rwsem);
+			}
+		}
+	}
+
+	put_task_struct(child);
+	mutex_unlock(&uproc->mutex);
+
+fail:
+	return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * Exec callback: The associated process called execve() or friends
+ *
+ * The new program is about to start running and so there is no
+ * possibility of a uprobe from the previous user address space
+ * to be hit.
+ *
+ * NOTE:
+ *	Ideally this process would have passed through the clone
+ *	callback, where the necessary action *should* have been
+ *	taken. However, if we still end up at this callback:
+ *		- We don't have to clear the uprobes - memory image
+ *		  will be overlaid.
+ *		- We have to free up uprobe resources associated with
+ *		  this process.
+ */
+static u32 uprobe_report_exec(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, const struct linux_binprm *bprm,
+		struct pt_regs *regs)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_task *utask;
+	int uproc_freed;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(tsk->tgid);
+	BUG_ON(uproc != utask->uproc);
+
+	uprobe_cleanup_process(uproc);
+	/* If any [un]register_uprobe is pending, it'll clean up. */
+	uproc_freed = uprobe_maybe_free_process(uproc);
+	mutex_unlock(&uproc_mutex);
+	return (uproc_freed ? UTRACE_ACTION_DETACH : UTRACE_ACTION_RESUME);
+}
+
+static const struct utrace_engine_ops uprobe_utrace_ops =
+{
+	.report_quiesce = uprobe_report_quiesce,
+	.report_signal = uprobe_report_signal,
+	.report_exit = uprobe_report_exit,
+	.report_clone = uprobe_report_clone,
+	.report_exec = uprobe_report_exec
+};
+
+#define arch_init_uprobes() 0
+
+static int __init init_uprobes(void)
+{
+	int i, err = 0;
+
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+		INIT_HLIST_HEAD(&uproc_table[i]);
+
+	p_uprobe_utrace_ops = &uprobe_utrace_ops;
+	err = arch_init_uprobes();
+	return err;
+}
+__initcall(init_uprobes);
+
+
+EXPORT_SYMBOL_GPL(register_uprobe);
+EXPORT_SYMBOL_GPL(unregister_uprobe);
_
This patch implements userspace return probes (uretprobes). Similar to
kretprobes, uretprobes work by bouncing the return from a function off a
known trampoline, at which time the user-specified handler is run.

o A user desirous of inserting a uretprobe on a process needs to call
init_uretprobes(), specifying a virtual address in the process address
space that won't be used by the process subsequently.  The address of main()
in a currently executing program is one such candidate for the
trampoline. init_uretprobes() establishes a breakpoint at the trampoline.

o Uretprobes work by first inserting a uprobe at the required function
entry and making a copy of the return address from this function. It then
replaces this address with the address of the trampoline.

o After execution of the user-specified handler, normal program execution
continues.

---

 arch/i386/kernel/uprobes.c |   40 ++++
 include/asm-i386/uprobes.h |   10 +
 include/linux/uprobes.h    |   49 +++++
 kernel/uprobes.c           |  374 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 463 insertions(+), 10 deletions(-)

diff -puN arch/i386/kernel/uprobes.c~2-user-return-probes arch/i386/kernel/uprobes.c
--- linux-2.6.21-rc3/arch/i386/kernel/uprobes.c~2-user-return-probes	2007-03-12 13:48:50.000000000 -0700
+++ linux-2.6.21-rc3-jimk/arch/i386/kernel/uprobes.c	2007-03-12 13:50:06.000000000 -0700
@@ -113,6 +113,7 @@ fail:
 		" pid=%d, esp=%#lx\n", current->pid, esp);
 	BUG();
 }
+
 /*
  * Called after single-stepping.  uk->vaddr is the address of the
  * instruction whose first byte has been replaced by the "int 3"
@@ -184,3 +185,42 @@ void uprobe_resume_execution(struct upro
 	up_write(&tsk->mm->mmap_sem);
 }
 #endif	/* SSTEP_OUT_OF_LINE */
+
+/*
+ * Replace the return address with the trampoline address.  Returns
+ * the original return address.
+ */
+unsigned long arch_hijack_uret_addr(unsigned long trampoline_address,
+		struct pt_regs *regs)
+{
+	int nleft;
+	unsigned long orig_ret_addr;
+#define RASIZE (sizeof(unsigned long))
+
+	nleft = copy_from_user(&orig_ret_addr,
+		       (const void __user *)regs->esp, RASIZE);
+	if (unlikely(nleft != 0))
+		return 0;
+
+	if (orig_ret_addr == trampoline_address)
+		/*
+		 * There's another uretprobe on this function, and it was
+		 * processed first, so the return address has already
+		 * been hijacked.
+		 */
+		return orig_ret_addr;
+
+	nleft = copy_to_user((void __user *)regs->esp,
+		       &trampoline_address, RASIZE);
+	if (unlikely(nleft != 0)) {
+		if (nleft != RASIZE) {
+			printk(KERN_ERR "uretprobe_entry_handler: "
+					"return address partially clobbered -- "
+					"pid=%d, %%esp=%#lx, %%eip=%#lx\n",
+					current->pid, regs->esp, regs->eip);
+			BUG();
+		} /* else nothing written, so no harm */
+		return 0;
+	}
+	return orig_ret_addr;
+}
diff -puN include/asm-i386/uprobes.h~2-user-return-probes include/asm-i386/uprobes.h
--- linux-2.6.21-rc3/include/asm-i386/uprobes.h~2-user-return-probes	2007-03-12 13:48:50.000000000 -0700
+++ linux-2.6.21-rc3-jimk/include/asm-i386/uprobes.h	2007-03-12 13:50:06.000000000 -0700
@@ -60,4 +60,14 @@ static inline void arch_reset_ip_for_sst
 	regs->eip -= BP_INSN_SIZE;
 }
 
+#define ARCH_SUPPORTS_URETPROBES 1
+static inline void arch_restore_uret_addr(unsigned long ret_addr,
+		struct pt_regs *regs)
+{
+	regs->eip = ret_addr;
+}
+
+extern unsigned long arch_hijack_uret_addr(unsigned long trampoline_addr,
+		struct pt_regs *regs);
+
 #endif				/* _ASM_UPROBES_H */
diff -puN include/linux/uprobes.h~2-user-return-probes include/linux/uprobes.h
--- linux-2.6.21-rc3/include/linux/uprobes.h~2-user-return-probes	2007-03-12 13:48:50.000000000 -0700
+++ linux-2.6.21-rc3-jimk/include/linux/uprobes.h	2007-03-12 13:50:06.000000000 -0700
@@ -32,6 +32,12 @@ struct utrace_attached_engine;
 struct uprobe_kimg;
 struct uprobe;
 
+enum uprobe_type {
+	UPTY_UPROBE,
+	UPTY_URETPROBE,
+	UPTY_TRAMPOLINE
+};
+
 /*
  * This is what the user supplies us.
  */
@@ -69,6 +75,15 @@ struct uprobe {
 	pid_t tgid;
 };
 
+struct uretprobe_instance;
+typedef void (*uretprobe_handler_t)(struct uretprobe_instance *,
+		struct pt_regs *);
+
+struct uretprobe {
+	struct uprobe u;
+	uretprobe_handler_t handler;
+};
+
 #ifdef CONFIG_UPROBES
 #include <asm/uprobes.h>
 
@@ -133,6 +148,15 @@ struct uprobe_process {
 
 	/* Threads in SLEEPING state wait here to be roused. */
 	wait_queue_head_t waitq;
+
+	/*
+	 * Return-probed functions return via this trampoline.
+	 * Future implementations may use a system-wide trampoline
+	 * (e.g., on the vdso page), in which case uretprobe_trampoline_addr
+	 * might be valid even though uretprobe_trampoline is NULL.
+	 */
+	struct uprobe *uretprobe_trampoline;
+	unsigned long uretprobe_trampoline_addr;
 };
 
 /*
@@ -185,6 +209,9 @@ struct uprobe_kimg {
  * Corresponds to a thread in a probed process.
  */
 struct uprobe_task {
+	/* Lives in the global utask_table */
+	struct hlist_node hlist;
+
 	/* Lives on the thread_list for the uprobe_process */
 	struct list_head list;
 
@@ -226,6 +253,12 @@ struct uprobe_task {
 	struct mutex mutex;
 };
 
+struct uretprobe_instance {
+	struct uretprobe *rp;
+	unsigned long ret_addr;
+	struct hlist_node hlist;
+};
+
 int register_uprobe(struct uprobe *u);
 void unregister_uprobe(struct uprobe *u);
 
@@ -236,8 +269,11 @@ extern int uprobe_prepare_singlestep(str
 			struct uprobe_task *utask, struct pt_regs *regs);
 #endif
 
-#else	/* CONFIG_UPROBES */
+int register_uretprobe(struct uretprobe *rp);
+void unregister_uretprobe(struct uretprobe *rp);
+int init_uretprobes(pid_t pid, unsigned long vaddr);
 
+#else	/* CONFIG_UPROBES */
 static inline int register_uprobe(struct uprobe *u)
 {
 	return -ENOSYS;
@@ -245,5 +281,16 @@ static inline int register_uprobe(struct
 static inline void unregister_uprobe(struct uprobe *u)
 {
 }
+static inline int register_uretprobe(struct uretprobe *u)
+{
+	return -ENOSYS;
+}
+static inline void unregister_uretprobe(struct uretprobe *u)
+{
+}
+static int init_uretprobes(pid_t pid, unsigned long vaddr)
+{
+	return -ENOSYS;
+}
 #endif	/* CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff -puN kernel/uprobes.c~2-user-return-probes kernel/uprobes.c
--- linux-2.6.21-rc3/kernel/uprobes.c~2-user-return-probes	2007-03-12 13:48:50.000000000 -0700
+++ linux-2.6.21-rc3-jimk/kernel/uprobes.c	2007-03-12 13:50:06.000000000 -0700
@@ -36,6 +36,8 @@
 
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
 	void *buf, int len, int write);
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs);
+static void uretprobe_handle_return(struct uprobe *u, struct pt_regs *regs);
 
 /*
  * Locking hierarchy:
@@ -56,9 +58,52 @@ static struct hlist_head uproc_table[UPR
  */
 static DEFINE_MUTEX(uproc_mutex);
 
-/* p_uprobe_utrace_ops = &uprobe_utrace_ops.  Fwd refs are a pain w/o this. */
 static const struct utrace_engine_ops *p_uprobe_utrace_ops;
 
+/* Table of uprobe_tasks, hashed by task_struct pointer. */
+static struct hlist_head utask_table[UPROBE_TABLE_SIZE];
+static DEFINE_SPINLOCK(utask_table_lock);
+
+static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct uprobe_task *utask;
+	unsigned long flags;
+
+	head = &utask_table[hash_ptr(tsk, UPROBE_HASH_BITS)];
+	spin_lock_irqsave(&utask_table_lock, flags);
+	hlist_for_each_entry(utask, node, head, hlist) {
+		if (utask->tsk == tsk) {
+			spin_unlock_irqrestore(&utask_table_lock, flags);
+			return utask;
+		}
+	}
+	spin_unlock_irqrestore(&utask_table_lock, flags);
+	return NULL;
+}
+
+static void uprobe_hash_utask(struct uprobe_task *utask)
+{
+	struct hlist_head *head;
+	unsigned long flags;
+
+	INIT_HLIST_NODE(&utask->hlist);
+	head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)];
+	spin_lock_irqsave(&utask_table_lock, flags);
+	hlist_add_head(&utask->hlist, head);
+	spin_unlock_irqrestore(&utask_table_lock, flags);
+}
+
+static void uprobe_unhash_utask(struct uprobe_task *utask)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&utask_table_lock, flags);
+	hlist_del(&utask->hlist);
+	spin_unlock_irqrestore(&utask_table_lock, flags);
+}
+
 /* Runs with the uproc_mutex held.  Returns with uproc->mutex held. */
 struct uprobe_process *uprobe_find_process(pid_t tgid)
 {
@@ -348,6 +393,28 @@ static int quiesce_all_threads(struct up
 	return survivors;
 }
 
+/* Called with utask->mutex and utask->uproc->mutex held. */
+static void uprobe_free_task(struct uprobe_task *utask)
+{
+	struct uretprobe_instance *ri;
+	struct hlist_node *r1, *r2;
+
+	uprobe_unhash_utask(utask);
+	list_del(&utask->list);
+
+	/*
+	 * Trampoline uprobe stays around 'til task exits, so assume
+	 * task is exiting if any uretprobe_instances remain.
+	 */
+	hlist_for_each_entry_safe(ri, r1, r2, &utask->uretprobe_instances,
+			hlist) {
+		hlist_del(&ri->hlist);
+		kfree(ri);
+	}
+	mutex_unlock(&utask->mutex);
+	kfree(utask);
+}
+
 /* Runs with uproc_mutex and uproc->mutex held. */
 static void uprobe_free_process(struct uprobe_process *uproc)
 {
@@ -365,10 +432,12 @@ static void uprobe_free_process(struct u
 		 */
 		if (utask->engine)
 			utrace_detach(utask->tsk, utask->engine);
-		mutex_unlock(&utask->mutex);
-		kfree(utask);
+		uprobe_free_task(utask);
 	}
 
+	if (uproc->uretprobe_trampoline)
+		kfree(uproc->uretprobe_trampoline);
+
 	mutex_unlock(&uproc->mutex);	// So kfree doesn't complain
 	kfree(uproc);
 }
@@ -457,6 +526,10 @@ static struct uprobe_task *uprobe_add_ta
 			UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT));
 	INIT_LIST_HEAD(&utask->list);
 	list_add_tail(&utask->list, &uproc->thread_list);
+
+	INIT_HLIST_HEAD(&utask->uretprobe_instances);
+	uprobe_hash_utask(utask);
+
 	/*
 	 * Note that it's OK if t dies just after utrace_attach, because
 	 * with the engine in place, the appropriate report_* callback
@@ -517,6 +590,8 @@ static struct uprobe_process *uprobe_mk_
 	uproc->n_quiescent_threads = 0;
 	INIT_HLIST_NODE(&uproc->hlist);
 	uproc->tgid = p->tgid;
+	uproc->uretprobe_trampoline = NULL;
+	uproc->uretprobe_trampoline_addr = 0;
 
 	/*
 	 * Create and populate one utask per thread in this process.  We
@@ -631,8 +706,21 @@ static void purge_uprobe(struct uprobe *
 	up_write(&uk->rwsem);
 }
 
+static inline int is_uretprobe(struct uprobe *u)
+{
+	return (u->handler == uretprobe_handle_entry);
+}
+
+static inline int is_uretprobe_trampoline(struct uprobe *u)
+{
+	return (u->handler == uretprobe_handle_return);
+}
+
 /*
  * See Documentation/uprobes.txt.
+ * register_uprobe() also does most of the work for register_uretprobe()
+ * and init_uretprobes() -- including some error checking that is more
+ * convenient to do here.
  */
 int register_uprobe(struct uprobe *u)
 {
@@ -685,6 +773,18 @@ int register_uprobe(struct uprobe *u)
 		uproc_is_new = 1;
 	}
 
+	if (is_uretprobe(u) && uproc->uretprobe_trampoline_addr == 0) {
+		/*
+		 * Can't call register_uretprobe() before init_uretprobes().
+		 * (However, you can call init_uretprobes() and then
+		 * register_uretprobe() from the same handler, in which
+		 * case installation of both probes will be deferred.
+		 * So we wait 'til here to check.)
+		 */
+		ret = -ENODEV;
+		goto fail_uproc;
+	}
+
 	INIT_LIST_HEAD(&u->list);
 
 	/* See if we already have a uprobe at the vaddr. */
@@ -921,6 +1021,7 @@ static u32 uprobe_report_signal(struct u
 #endif
 	u32 ret;
 	unsigned long probept;
+	int hit_uretprobe_trampoline;
 
 	utask = rcu_dereference((struct uprobe_task *)engine->data);
 	BUG_ON(!utask);
@@ -933,6 +1034,8 @@ static u32 uprobe_report_signal(struct u
 	case UPTASK_RUNNING:
 		uproc = utask->uproc;
 		probept = arch_get_probept(regs);
+		hit_uretprobe_trampoline =
+			(probept == uproc->uretprobe_trampoline_addr);
 
 		/* Honor locking hierarchy: uproc before utask. */
 		if (!mutex_trylock(&uproc->mutex)) {
@@ -959,9 +1062,22 @@ static u32 uprobe_report_signal(struct u
 			list_for_each_entry(u, &uk->uprobe_list, list) {
 				if (u->handler)
 					u->handler(u, regs);
+
+				/*
+				 * If multiple calls to init_uretprobes()
+				 * specify the same trampoline address,
+				 * there's a tiny time window when multiple
+				 * trampoline probes could be registered
+				 * at this probepoint.  This prevents
+				 * multiple calls to uretprobe_handle_return().
+				 */
+				if (is_uretprobe_trampoline(u))
+					break;
 			}
 		}
 		up_read(&uk->rwsem);
+		if (hit_uretprobe_trampoline)
+			goto bp_done;
 
 #ifdef SS_OUT_OF_LINE
 		ret = uprobe_prepare_singlestep(uk, utask, regs);
@@ -1001,6 +1117,8 @@ static u32 uprobe_report_signal(struct u
 		uprobe_resume_execution(uk, utask, regs);
 #endif
 
+bp_done:
+		/* Note: Can come here after running uretprobe handlers */
 		utask->active_probe = NULL;
 		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
 			| UTRACE_ACTION_NEWSTATE;
@@ -1206,11 +1324,9 @@ static u32 uprobe_report_exit(struct utr
 	uproc = utask->uproc;
 	mutex_lock(&uproc_mutex);
 	mutex_lock(&uproc->mutex);
+	mutex_lock(&utask->mutex);
 	utask_quiescing = utask->quiescing;
-
-	list_del(&utask->list);
-	kfree(utask);
-
+	uprobe_free_task(utask);
 	uproc->nthreads--;
 	if (uproc->nthreads) {
 		if (utask_quiescing)
@@ -1363,14 +1479,252 @@ static const struct utrace_engine_ops up
 	.report_exec = uprobe_report_exec
 };
 
+
+#ifdef ARCH_SUPPORTS_URETPROBES
+
+/* uprobe handler called when the entry-point probe u is hit. */
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs)
+{
+	struct uprobe_task *utask;
+	struct uretprobe_instance *ri;
+	unsigned long trampoline_addr;
+
+	utask = uprobe_find_utask(current);
+	BUG_ON(!utask);
+	trampoline_addr = utask->uproc->uretprobe_trampoline_addr;
+	if (!trampoline_addr)
+		return;
+	ri = (struct uretprobe_instance *)
+		kmalloc(sizeof(struct uretprobe_instance), GFP_USER);
+	if (!ri)
+		return;
+	ri->ret_addr = arch_hijack_uret_addr(trampoline_addr, regs);
+	if (likely(ri->ret_addr)) {
+		ri->rp = container_of(u, struct uretprobe, u);
+		INIT_HLIST_NODE(&ri->hlist);
+		hlist_add_head(&ri->hlist, &utask->uretprobe_instances);
+	} else
+		kfree(ri);
+}
+
+/*
+ * For each uretprobe_instance pushed onto the LIFO for the function
+ * instance that's now returning, call the handler and free the ri.
+ * Returns the original return address.
+ */
+static unsigned long uretprobe_run_handlers(struct uprobe_task *utask,
+		struct pt_regs *regs, unsigned long trampoline_addr)
+{
+	unsigned long ret_addr;
+	struct hlist_head *head = &utask->uretprobe_instances;
+	struct uretprobe_instance *ri;
+	struct hlist_node *r1, *r2;
+
+	hlist_for_each_entry_safe(ri, r1, r2, head, hlist) {
+		if (ri->rp && ri->rp->handler)
+			ri->rp->handler(ri, regs);
+		ret_addr = ri->ret_addr;
+		hlist_del(&ri->hlist);
+		kfree(ri);
+		if (ret_addr != trampoline_addr)
+			/*
+			 * This is the first ri (chronologically) pushed for
+			 * this particular instance of the probed function.
+			 */
+			return ret_addr;
+	}
+	BUG();
+	return 0;
+}
+
+/* uprobe handler called when the uretprobe trampoline is hit. */
+static void uretprobe_handle_return(struct uprobe *u, struct pt_regs *regs)
+{
+	unsigned long orig_ret_addr;
+	struct uprobe_task *utask = uprobe_find_utask(current);
+	BUG_ON(!utask);
+	orig_ret_addr = uretprobe_run_handlers(utask, regs, u->vaddr);
+	arch_restore_uret_addr(orig_ret_addr, regs);
+}
+
+int register_uretprobe(struct uretprobe *rp)
+{
+	if (!rp)
+		return -EINVAL;
+	rp->u.handler = uretprobe_handle_entry;
+	return register_uprobe(&rp->u);
+}
+
+/*
+ * rp has just been unregistered.  Its uretprobe_instances have to hang
+ * around 'til their associated instances return.  Zap ri->rp for each
+ * one to indicate unregistration.
+ *
+ * Called and returns with no locks held.
+ */
+static void zap_uretprobe_instances(struct uretprobe *rp)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_task *utask;
+
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(rp->u.tgid);
+
+	if (!uproc) {
+		mutex_unlock(&uproc_mutex);
+		return;
+	}
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		struct hlist_node *r;
+		struct uretprobe_instance *ri;
+
+		mutex_lock(&utask->mutex);
+		hlist_for_each_entry(ri, r, &utask->uretprobe_instances, hlist)
+			if (ri->rp == rp)
+				ri->rp = NULL;
+		mutex_unlock(&utask->mutex);
+	}
+	mutex_unlock(&uproc->mutex);
+	mutex_unlock(&uproc_mutex);
+}
+
+void unregister_uretprobe(struct uretprobe *rp)
+{
+	if (!rp)
+		return;
+	unregister_uprobe(&rp->u);
+	if (rp->u.status == 0)
+		zap_uretprobe_instances(rp);
+}
+
+/*
+ * NOTE: init_uretprobes(), do_init_uretprobes(), and uretprobe_set_trampoline()
+ * can all go away if we can make uprobes smart enough to define the trampoline
+ * on its own.  E.g., on i386, we could establish one global trampoline on
+ * the vdso page.
+ */
+
+/*
+ * See do_init_uretprobes, below.  We could chase trampoline->uk->uproc,
+ * but there's a tiny chance that uk and uproc have just disappeared (e.g.,
+ * because trampoline->pid exited).
+ */
+static int uretprobe_set_trampoline(struct uprobe *trampoline,
+		unsigned long *prev_trampoline)
+{
+	struct uprobe_process *uproc;
+	int result;
+
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(trampoline->tgid);
+	if (unlikely(!uproc)) {
+		mutex_unlock(&uproc_mutex);
+		return -ESRCH;
+	}
+	if (uproc->uretprobe_trampoline) {
+		result = -EEXIST;
+		*prev_trampoline = uproc->uretprobe_trampoline_addr;
+	} else {
+		uproc->uretprobe_trampoline = trampoline;
+		uproc->uretprobe_trampoline_addr = trampoline->vaddr;
+		result = 0;
+	}
+	mutex_unlock(&uproc->mutex);
+	mutex_unlock(&uproc_mutex);
+	return result;
+}
+
+/*
+ * Register the specified uprobe and make it the uretprobe_trampoline
+ * for its uprobe_process.  On success, return 0.  If we successfully
+ * registered the probe, but the uretprobe_trampoline was already set
+ * by somebody else, return -EEXIST and set prev_trampoline to the
+ * previously set address.
+ */
+static int do_init_uretprobes(struct uprobe *trampoline,
+		unsigned long *prev_trampoline)
+{
+	int result;
+
+	trampoline->handler = uretprobe_handle_return;
+	result = register_uprobe(trampoline);
+	if (result == 0) {
+		result = uretprobe_set_trampoline(trampoline,
+				prev_trampoline);
+		if (result == -EEXIST)
+			unregister_uprobe(trampoline);
+	}
+	return result;
+}
+
+/*
+ * Ensure that pid's uprobe_process has a trampoline probe at vaddr.
+ * We have to allocate our own uprobe (rather than use one passed in
+ * from the user) because the user may rmmod his module while some
+ * uretprobe instances are still outstanding.
+ */
+int init_uretprobes(pid_t pid, unsigned long vaddr)
+{
+	struct uprobe *trampoline;
+	unsigned long prev_trampoline = 0;
+	int result;
+
+	if (!vaddr)
+		/* TODO: Support uprobes choosing the address. */
+		return -EOPNOTSUPP;
+
+	trampoline = kzalloc(sizeof(struct uprobe), GFP_USER);
+	if (!trampoline)
+		return -ENOMEM;
+
+	trampoline->pid = pid;
+	trampoline->vaddr = vaddr;
+	result = do_init_uretprobes(trampoline, &prev_trampoline);
+	if (result == 0 || result == -EINPROGRESS)
+		return result;
+
+	kfree(trampoline);
+	if (result == -EEXIST && prev_trampoline == vaddr)
+		return 0;
+	return result;
+}
+
+#else	/* ARCH_SUPPORTS_URETPROBES */
+static int register_uretprobe(struct uretprobe *rp)
+{
+	return -ENOSYS;
+}
+static int init_uretprobes(pid_t pid, unsigned long vaddr)
+{
+	return -ENOSYS;
+}
+static void unregister_uretprobe(struct uretprobe *rp)
+{
+}
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs)
+{
+}
+static void uretprobe_handle_return(struct uprobe *u, struct pt_regs *regs)
+{
+}
+static int do_init_uretprobes(struct uprobe *trampoline,
+		unsigned long *prev_trampoline)
+{
+	return -ENOSYS;
+}
+#endif	/* ARCH_SUPPORTS_URETPROBES */
+
 #define arch_init_uprobes() 0
 
 static int __init init_uprobes(void)
 {
 	int i, err = 0;
 
-	for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
 		INIT_HLIST_HEAD(&uproc_table[i]);
+		INIT_HLIST_HEAD(&utask_table[i]);
+	}
 
 	p_uprobe_utrace_ops = &uprobe_utrace_ops;
 	err = arch_init_uprobes();
@@ -1378,6 +1732,8 @@ static int __init init_uprobes(void)
 }
 __initcall(init_uprobes);
 
-
 EXPORT_SYMBOL_GPL(register_uprobe);
 EXPORT_SYMBOL_GPL(unregister_uprobe);
+EXPORT_SYMBOL_GPL(register_uretprobe);
+EXPORT_SYMBOL_GPL(unregister_uretprobe);
+EXPORT_SYMBOL_GPL(init_uretprobes);
_
This patch enables uprobe and uretprobe handlers to register/unregister
themselves or other probes. If a request for [un]registering
uprobes happens during the course of probe-handler execution, it is
identified as such by tracking the state of the currently running
uprobe_task. Such requests are queued and executed after the uprobe
handler completes.

---

 include/linux/uprobes.h |   11 +++
 kernel/uprobes.c        |  175 +++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 176 insertions(+), 10 deletions(-)

diff -puN include/linux/uprobes.h~3-defer-registration include/linux/uprobes.h
--- linux-2.6.21-rc3/include/linux/uprobes.h~3-defer-registration	2007-03-12 14:04:00.000000000 -0700
+++ linux-2.6.21-rc3-jimk/include/linux/uprobes.h	2007-03-12 14:04:32.000000000 -0700
@@ -55,6 +55,17 @@ struct uprobe {
 	/* Handler to run when the probepoint is hit */
 	void (*handler)(struct uprobe*, struct pt_regs*);
 
+	/*
+	 * This function, if non-NULL, will be called upon completion of
+	 * an ASYNCHRONOUS registration (i.e., one initiated by a uprobe
+	 * handler).  reg = 1 for register, 0 for unregister.  type
+	 * specifies the type of [un]register call (uprobe, uretprobe,
+	 * or init_retprobes).
+	 */
+	void (*registration_callback)(struct uprobe *u, int reg,
+			enum uprobe_type type, int result);
+
+
 	/* Subsequent members are for internal use only. */
 
 	/*
diff -puN kernel/uprobes.c~3-defer-registration kernel/uprobes.c
--- linux-2.6.21-rc3/kernel/uprobes.c~3-defer-registration	2007-03-12 14:04:00.000000000 -0700
+++ linux-2.6.21-rc3-jimk/kernel/uprobes.c	2007-03-12 14:04:32.000000000 -0700
@@ -38,6 +38,8 @@ extern int access_process_vm(struct task
 	void *buf, int len, int write);
 static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs);
 static void uretprobe_handle_return(struct uprobe *u, struct pt_regs *regs);
+static int utask_quiesce_in_callback(struct uprobe_task *utask);
+static void uprobe_run_def_regs(struct list_head *drlist);
 
 /*
  * Locking hierarchy:
@@ -60,6 +62,13 @@ static DEFINE_MUTEX(uproc_mutex);
 
 static const struct utrace_engine_ops *p_uprobe_utrace_ops;
 
+struct deferred_registration {
+	struct list_head list;
+	struct uprobe *uprobe;
+	int regflag;	/* 0 - unregister, 1 - register */
+	enum uprobe_type type;
+};
+
 /* Table of uprobe_tasks, hashed by task_struct pointer. */
 static struct hlist_head utask_table[UPROBE_TABLE_SIZE];
 static DEFINE_SPINLOCK(utask_table_lock);
@@ -358,12 +367,14 @@ static void check_uproc_quiesced(struct 
  * breakpoint insertion.  Runs with uproc->mutex held.
  * Returns the number of threads that haven't died yet.
  */
-static int quiesce_all_threads(struct uprobe_process *uproc)
+static int quiesce_all_threads(struct uprobe_process *uproc,
+		struct uprobe_task **cur_utask_quiescing)
 {
 	struct uprobe_task *utask;
 	struct task_struct *survivor = NULL;	// any survivor
 	int survivors = 0;
 
+	*cur_utask_quiescing = NULL;
 	list_for_each_entry(utask, &uproc->thread_list, list) {
 		mutex_lock(&utask->mutex);
 		survivor = utask->tsk;
@@ -374,7 +385,9 @@ static int quiesce_all_threads(struct up
 			 * check utask->quiescing and quiesce when it's done.
 			 */
 			utask->quiescing = 1;
-			if (utask->state == UPTASK_RUNNING) {
+			if (utask->tsk == current)
+				*cur_utask_quiescing = utask;
+			else if (utask->state == UPTASK_RUNNING) {
 				utask->quiesce_master = current;
 				utask_adjust_flags(utask, SET_ENGINE_FLAGS,
 					UTRACE_ACTION_QUIESCE
@@ -396,11 +409,16 @@ static int quiesce_all_threads(struct up
 /* Called with utask->mutex and utask->uproc->mutex held. */
 static void uprobe_free_task(struct uprobe_task *utask)
 {
+	struct deferred_registration *dr, *d;
 	struct uretprobe_instance *ri;
 	struct hlist_node *r1, *r2;
 
 	uprobe_unhash_utask(utask);
 	list_del(&utask->list);
+	list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) {
+		list_del(&dr->list);
+		kfree(dr);
+	}
 
 	/*
 	 * Trampoline uprobe stays around 'til task exits, so assume
@@ -504,6 +522,8 @@ static struct uprobe_task *uprobe_add_ta
 	utask->quiescing = 0;
 	utask->uproc = uproc;
 	utask->active_probe = NULL;
+	INIT_HLIST_HEAD(&utask->uretprobe_instances);
+	INIT_LIST_HEAD(&utask->deferred_registrations);
 
 	engine = utrace_attach(t, UTRACE_ATTACH_CREATE, p_uprobe_utrace_ops,
 		(unsigned long)utask);
@@ -526,8 +546,6 @@ static struct uprobe_task *uprobe_add_ta
 			UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT));
 	INIT_LIST_HEAD(&utask->list);
 	list_add_tail(&utask->list, &uproc->thread_list);
-
-	INIT_HLIST_HEAD(&utask->uretprobe_instances);
 	uprobe_hash_utask(utask);
 
 	/*
@@ -716,6 +734,30 @@ static inline int is_uretprobe_trampolin
 	return (u->handler == uretprobe_handle_return);
 }
 
+/* Runs with utask->mutex held.  Returns -EINPROGRESS on success. */
+static int defer_registration(struct uprobe *u, int regflag,
+		struct uprobe_task *utask)
+{
+	struct deferred_registration *dr =
+		kmalloc(sizeof(struct deferred_registration), GFP_USER);
+
+	if (!dr)
+		return -ENOMEM;
+
+	if (is_uretprobe(u))
+		dr->type = UPTY_URETPROBE;
+	else if (is_uretprobe_trampoline(u))
+		dr->type = UPTY_TRAMPOLINE;
+	else
+		dr->type = UPTY_UPROBE;
+
+	dr->uprobe = u;
+	dr->regflag = regflag;
+	INIT_LIST_HEAD(&dr->list);
+	list_add_tail(&dr->list, &utask->deferred_registrations);
+	return -EINPROGRESS;
+}
+
 /*
  * See Documentation/uprobes.txt.
  * register_uprobe() also does most of the work for register_uretprobe()
@@ -727,6 +769,7 @@ int register_uprobe(struct uprobe *u)
 	struct task_struct *p;
 	struct uprobe_process *uproc;
 	struct uprobe_kimg *uk;
+	struct uprobe_task *cur_utask, *cur_utask_quiescing = NULL;
 	int survivors, ret = 0, uproc_is_new = 0;
 /* We should be able to access atleast a bkpt-size insn at u->addr */
 #define NBYTES_TO_TEST BP_INSN_SIZE
@@ -749,6 +792,17 @@ int register_uprobe(struct uprobe *u)
 		return -ESRCH;
 	u->tgid = p->tgid;
 
+	cur_utask = uprobe_find_utask(current);
+	if (cur_utask && cur_utask->active_probe) {
+		/*
+		 * Handler running; cur_utask is locked.
+		 * Do this registration later.
+		 */
+		put_task_struct(p);
+		u->status = defer_registration(u, 1, cur_utask);
+		return u->status;
+	}
+
 	/* Exit early if vaddr is bad -- i.e., we can't even read from it. */
 	if (access_process_vm(p, u->vaddr, buf, NBYTES_TO_TEST, 0)
 			!= NBYTES_TO_TEST) {
@@ -797,6 +851,10 @@ int register_uprobe(struct uprobe *u)
 		switch (uk->state) {
 		case UPROBE_INSERTING:
 			u->status = -EBUSY;
+			if (uproc->tgid == current->tgid) {
+				cur_utask_quiescing = cur_utask;
+				BUG_ON(!cur_utask_quiescing);
+			}
 			break;
 		case UPROBE_REMOVING:
 			/* Wait!  Don't remove that bkpt after all! */
@@ -830,7 +888,7 @@ int register_uprobe(struct uprobe *u)
 		mutex_unlock(&uproc_mutex);
 	}
 	put_task_struct(p);
-	survivors = quiesce_all_threads(uproc);
+	survivors = quiesce_all_threads(uproc, &cur_utask_quiescing);
 	mutex_unlock(&uproc->mutex);
 
 	if (survivors == 0) {
@@ -839,7 +897,15 @@ int register_uprobe(struct uprobe *u)
 	}
 
 await_bkpt_insertion:
-	wait_event(uk->waitq, uk->state != UPROBE_INSERTING);
+	if (cur_utask_quiescing)
+		/*
+		 * Current task is probing its own process. Assume
+		 * register_uprobe was called from uprobe_run_def_regs.
+		 * uproc won't be freed because uk is still connected
+		 */
+		(void)utask_quiesce_in_callback(cur_utask_quiescing);
+	else
+		wait_event(uk->waitq, uk->state != UPROBE_INSERTING);
 	ret = u->status;
 	if (ret != 0)
 		purge_uprobe(u);
@@ -860,7 +926,7 @@ void unregister_uprobe(struct uprobe *u)
 {
 	struct uprobe_process *uproc;
 	struct uprobe_kimg *uk;
-	int survivors;
+	struct uprobe_task *cur_utask, *cur_utask_quiescing;
 
 	if (!u)
 		return;
@@ -876,6 +942,13 @@ void unregister_uprobe(struct uprobe *u)
 		/* Looks like register or unregister is already in progress. */
 		return;
 
+	cur_utask = uprobe_find_utask(current);
+	if (cur_utask && cur_utask->active_probe) {
+		/* Handler running; cur_utask is locked; do this later */
+		u->status = defer_registration(u, 0, cur_utask);
+		return;
+	}
+
 	/* As with unregister_kprobe, assume that u points to a valid probe. */
 	/* Grab the global mutex to ensure that uproc doesn't disappear. */
 	mutex_lock(&uproc_mutex);
@@ -901,9 +974,17 @@ void unregister_uprobe(struct uprobe *u)
 	list_add_tail(&uk->pd_node, &uproc->pending_uprobes);
 	up_write(&uk->rwsem);
 
-	survivors = quiesce_all_threads(uproc);
+	(void)quiesce_all_threads(uproc, &cur_utask_quiescing);
 	mutex_unlock(&uproc->mutex);
-	if (survivors)
+	if (cur_utask_quiescing)
+		/*
+		 * Current task is probing its own process. Assume
+		 * unregister_uprobe was called from
+		 * uprobe_run_def_regs. uproc won't be freed
+		 * 'cos uk is still connected
+		 */
+		(void)utask_quiesce_in_callback(cur_utask_quiescing);
+	else
 		wait_event(uk->waitq, uk->state != UPROBE_REMOVING);
 
 	down_write(&uk->rwsem);
@@ -920,7 +1001,14 @@ void unregister_uprobe(struct uprobe *u)
 
 /*
  * We are in a utrace callback for utask, and we've been asked to quiesce.
- * Somebody requested our quiescence and we hit a probepoint first.
+ * This could happen in either of the following cases, both of which
+ * occur in uprobe_report_signal() after we've finished processing a
+ * probepoint:
+ *
+ * 1) We are running uprobe_run_def_regs() on behalf of the
+ * just-executed uprobe handler(s).
+ *
+ * 2) Somebody requested our quiescence and we hit a probepoint first.
  * We'd like to just set the UTRACE_ACTION_QUIESCE and
  * UTRACE_EVENT(QUIESCE) flags and coast into quiescence.  Unfortunately,
  * it's possible to hit a probepoint again before we quiesce.  When
@@ -1003,6 +1091,8 @@ static int utask_quiesce_in_callback(str
  *		- Clean up after singlestepping
  *		- Set state = UPTASK_RUNNING
  *		- If it's time to quiesce, take appropriate action.
+ *		- If the handler(s) we ran called [un]register_uprobe(),
+ *			complete those via run_deferred_regsitrations().
  *
  *	state = ANY OTHER STATE
  *		- Not our signal, pass it on (UTRACE_ACTION_RESUME)
@@ -1022,6 +1112,7 @@ static u32 uprobe_report_signal(struct u
 	u32 ret;
 	unsigned long probept;
 	int hit_uretprobe_trampoline;
+	LIST_HEAD(def_reg_list);
 
 	utask = rcu_dereference((struct uprobe_task *)engine->data);
 	BUG_ON(!utask);
@@ -1120,6 +1211,14 @@ static u32 uprobe_report_signal(struct u
 bp_done:
 		/* Note: Can come here after running uretprobe handlers */
 		utask->active_probe = NULL;
+
+		/*
+		 * The deferred_registrations list accumulates in utask,
+		 * but utask could go away when we uprobe_run_def_regs.
+		 * So switch the list head to a local variable.
+		 */
+		list_splice_init(&utask->deferred_registrations, &def_reg_list);
+
 		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
 			| UTRACE_ACTION_NEWSTATE;
 		utask->state = UPTASK_RUNNING;
@@ -1130,6 +1229,7 @@ bp_done:
 		} else
 			mutex_unlock(&utask->mutex);
 
+		uprobe_run_def_regs(&def_reg_list);
 		break;
 	default:
 		mutex_unlock(&utask->mutex);
@@ -1364,6 +1464,10 @@ static u32 uprobe_report_exit(struct utr
  *
  * TODO:
  *	- Provide option for child to inherit uprobes.
+ *      - Nail down procedure for establishing uprobes in a newly forked
+ *      process.  Current thinking is that user's utrace engine (not ours)
+ *      returns UTRACE_ACTION_QUIESCE from report_clone, then runs
+ *      register_u*probe from report_quiesce.
  */
 static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
 		struct task_struct *parent, unsigned long clone_flags,
@@ -1715,6 +1819,57 @@ static int do_init_uretprobes(struct upr
 }
 #endif	/* ARCH_SUPPORTS_URETPROBES */
 
+/*
+ * This section provides support for async [un]registration.  When a
+ * uprobe handler calls [un]register_uprobe, a deferred_registration
+ * object is added to that utask's queue.  After running all
+ * handlers for that probepoint, uprobe_report_signal() calls
+ * uprobe_run_def_regs().
+ *
+ * FIXME: Handle the situation where the instrumentation module is rmmod-ed
+ * while it has one or more probes on the deferred_registrations list.
+ */
+/*
+ * Run all the deferred_registrations previously queued by the current utask.
+ * Runs with no locks or mutexes held.  The current utask could disappear
+ * as the result of unregister_u*probe() called here.
+ */
+static void uprobe_run_def_regs(struct list_head *drlist)
+{
+	struct deferred_registration *dr, *d;
+
+	list_for_each_entry_safe(dr, d, drlist, list) {
+		int result = 0;
+		struct uprobe *u = dr->uprobe;
+
+		if (dr->type == UPTY_URETPROBE) {
+			struct uretprobe *rp =
+				container_of(u, struct uretprobe, u);
+			if (dr->regflag)
+				result = register_uretprobe(rp);
+			else
+			       unregister_uretprobe(rp);
+		} else if (dr->type == UPTY_TRAMPOLINE) {
+			unsigned long x;
+			result = do_init_uretprobes(u, &x);
+			if (result != 0) {
+				kfree(u);
+				u = NULL;
+			}
+		} else {
+			if (dr->regflag)
+				result = register_uprobe(u);
+			else
+				unregister_uprobe(u);
+		}
+		if (u && u->registration_callback)
+			u->registration_callback(u, dr->regflag, dr->type,
+					result);
+		list_del(&dr->list);
+		kfree(dr);
+	}
+}
+
 #define arch_init_uprobes() 0
 
 static int __init init_uprobes(void)
_

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]