This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[RFC][PATCH 1/2] uprobes: utrace-based user-space probes
- From: Jim Keniston <jkenisto at us dot ibm dot com>
- To: systemtap <systemtap at sources dot redhat dot com>
- Date: Fri, 20 Apr 2007 15:08:10 -0700
- Subject: [RFC][PATCH 1/2] uprobes: utrace-based user-space probes
Here's the latest rev of our i386 implementation of user-space
probes (uprobes), based on Roland's utrace infrastructure. Uprobes
supplements utrace and kprobes, enabling a kernel module to probe
user-space applications in much the same way that a kprobes-based
module probes the kernel. See Documentation/uprobes.txt in the patch
for details.
This patch should apply to any recent -mm kernel (i.e., any kernel
that includes utrace). Comments welcome.
The single-stepping-out-of-line implementation has gotten pretty
big, and there are several alternatives to the way we implemented
it (the vma is fixed at 1 page and allocated only for processes
actually probed), so I've split that off as a separate patch.
(Watch this space.)
I'm reimplementing uretprobes to use a slot in the vma
for its trampoline. I hope to post updated uretprobes and
deferred-registration patches in a few days.
Jim Keniston
Uprobes supplements utrace and kprobes, enabling a kernel module
to probe user-space applications in much the same way that
a kprobes-based module probes the kernel.
Uprobes enables you to dynamically break into any routine in a
user application and collect debugging and performance information
non-disruptively. You can trap at any code address, specifying a
kernel handler routine to be invoked when the breakpoint is hit.
Uprobes is layered on top of utrace.
The registration function, register_uprobe(), specifies which process
is to be probed, where the probe is to be inserted, and what handler is
to be called when the probe is hit. Refer to Documentation/uprobes.txt
in this patch for usage examples.
Salient points:
o Like a debugger, uprobes uses a breakpoint instruction to break into
program execution. Through utrace's signal callback, uprobes recognizes
a probe hit and runs the user-specified handler. The handler may sleep.
o Breakpoint insertion is via access_process_vm() and hence is
copy-on-write and per-process.
o As uprobes uses utrace, a unique engine exists for every thread of a
probed process. Any newly created thread inherits all the probes and
gets an engine of its own. Upon thread exit, the engine is detached.
o Currently, uprobes aren't inherited across fork()s.
o A probe registration or ungregistration operation may sleep.
Using utrace, uprobes quiesces all threads in the probed process
before inserting or removing the breakpoint instruction.
---
Documentation/uprobes.txt | 425 +++++++++++++
arch/i386/Kconfig | 10
include/asm-i386/uprobes.h | 54 +
include/linux/uprobes.h | 242 +++++++
kernel/Makefile | 1
kernel/uprobes.c | 1370 +++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 2102 insertions(+)
diff -puN /dev/null Documentation/uprobes.txt
--- /dev/null 2007-04-20 10:18:59.502086278 -0700
+++ linux-2.6.21-rc6-jimk/Documentation/uprobes.txt 2007-04-20 09:27:22.000000000 -0700
@@ -0,0 +1,425 @@
+Title : User-Space Probes (Uprobes)
+Author : Jim Keniston <jkenisto@us.ibm.com>
+
+CONTENTS
+
+1. Concepts
+2. Architectures Supported
+3. Configuring Uprobes
+4. API Reference
+5. Uprobes Features and Limitations
+6. Interoperation with Kprobes
+7. Interoperation with Utrace
+8. Probe Overhead
+9. TODO
+10. Uprobes Team
+11. Uprobes Example
+
+1. Concepts
+
+Uprobes enables you to dynamically break into any routine in a
+user application and collect debugging and performance information
+non-disruptively. You can trap at any code address, specifying a
+kernel handler routine to be invoked when the breakpoint is hit.
+
+The registration function, register_uprobe(), specifies which
+process is to be probed, where the probe is to be inserted, and what
+handler is to be called when the probe is hit.
+
+Typically, Uprobes-based instrumentation is packaged as a kernel
+module. In the simplest case, the module's init function installs
+("registers") one or more probes, and the exit function unregisters
+them. However, probes can be registered or unregistered in response to
+other events as well. For example, you can establish Utrace callbacks
+to register and/or unregister probes when a particular process forks,
+clones a thread, execs, enters a system call, receives a signal,
+exits, etc. See Documentation/utrace.txt.
+
+1.1 How Does a Uprobe Work?
+
+When a uprobe is registered, Uprobes makes a copy of the probed
+instruction, stops the probed application, replaces the first byte(s)
+of the probed instruction with a breakpoint instruction (e.g., int3
+on i386 and x86_64), and allows the probed application to continue.
+(When inserting the breakpoint, Uprobes uses the same copy-on-write
+mechanism that ptrace uses, so that the breakpoint affects only that
+process, and not any other process running that program. This is
+true even if the probed instruction is in a shared library.)
+
+When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
+user-mode registers are saved, and a SIGTRAP signal is generated.
+Uprobes intercepts the SIGTRAP and finds the associated uprobe.
+It then executes the handler associated with the uprobe, passing the
+handler the addresses of the uprobe struct and the saved registers.
+The handler may block, but keep in mind that the probed thread remains
+stopped while your handler runs.
+
+Next, Uprobes single-steps the probed instruction and resumes execution
+of the probed process at the instruction following the probepoint.
+[Note: In the base uprobes patch, we temporarily remove the breakpoint
+instruction, insert the original opcode, single-step the instruction
+"inline", and then replace the breakpoint. This can create problems
+in a multithreaded application. For example, it opens a time window
+during which another thread can sail right past the probepoint.
+This problem is resolved in the "single-stepping out of line" patch.]
+
+1.2 The Role of Utrace
+
+When a probe is registered on a previously unprobed process,
+Uprobes establishes a tracing "engine" with Utrace (see
+Documentation/utrace.txt) for each thread (task) in the process.
+Uprobes uses the Utrace "quiesce" mechanism to stop all the threads
+prior to insertion or removal of a breakpoint. Utrace also notifies
+Uprobes of breakpoint and single-step traps and of other interesting
+events in the lifetime of the probed process, such as fork, clone,
+exec, and exit.
+
+1.3 Multithreaded Applications
+
+Uprobes supports the probing of multithreaded applications. Uprobes
+imposes no limit on the number of threads in a probed application.
+All threads in a process use the same text pages, so every probe
+in a process affects all threads; of course, each thread hits the
+probepoint (and runs the handler) independently. Multiple threads
+may run the same handler simultaneously. If you want a particular
+thread or set of threads to run a particular handler, your handler
+should check current or current->pid to determine which thread has
+hit the probepoint.
+
+When a process clones a new thread, that thread automatically shares
+all current and future probes established for that process.
+
+Keep in mind that when you register or unregister a probe, the
+breakpoint is not inserted or removed until Utrace has stopped all
+threads in the process. The register/unregister function returns
+after the breakpoint has been inserted/removed.
+
+2. Architectures Supported
+
+Uprobes is implemented on the following architectures:
+
+- i386
+- x86_64 (AMD-64, EM64T) // in progress
+- ppc64 // in progress
+// - ia64 // not started
+- s390x // in progress
+
+3. Configuring Uprobes
+
+// TODO: The patch actually puts Uprobes configuration under "Instrumentation
+// Support" with Kprobes. Need to decide which is the better place.
+
+When configuring the kernel using make menuconfig/xconfig/oldconfig,
+ensure that CONFIG_UPROBES is set to "y". Under "Process debugging
+support," select "Infrastructure for tracing and debugging user
+processes" to enable Utrace, then select "Uprobes".
+
+So that you can load and unload Uprobes-based instrumentation modules,
+make sure "Loadable module support" (CONFIG_MODULES) and "Module
+unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
+
+4. API Reference
+
+The Uprobes API includes two functions, register_uprobe() and
+unregister_uprobe(). Here are terse, mini-man-page specifications for
+these functions and the associated probe handlers that you'll write.
+See the latter half of this document for an example.
+
+4.1 register_uprobe
+
+#include <linux/uprobes.h>
+int register_uprobe(struct uprobe *u);
+
+Sets a breakpoint at virtual address u->vaddr in the process whose
+pid is u->pid. When the breakpoint is hit, Uprobes calls u->handler.
+
+register_uprobe() returns 0 on success, or a negative errno otherwise.
+
+User's handler (u->handler):
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+void handler(struct uprobe *u, struct pt_regs *regs);
+
+Called with u pointing to the uprobe associated with the breakpoint,
+and regs pointing to the struct containing the registers saved when
+the breakpoint was hit.
+
+4.2 unregister_uprobe
+
+#include <linux/uprobes.h>
+void unregister_uprobe(struct uprobe *u);
+
+Removes the specified probe. unregister_uprobe() can be called
+at any time after the probe has been registered.
+
+5. Uprobes Features and Limitations
+
+The user is expected to assign values only to the following members
+of struct uprobe: pid, vaddr, and handler. Other members are reserved
+for Uprobes' use. Uprobes may produce unexpected results if you:
+- assign non-zero values to reserved members of struct uprobe;
+- change the contents of a uprobe object while it is registered; or
+- attempt to register a uprobe that is already registered.
+
+Uprobes allows any number of probes at a particular address. For a
+particular probepoint, handlers are run in the order in which they
+were registered.
+
+Any number of kernel modules may probe a particular process
+simultaneously, and a particular module may probe any number of
+processes simultaneously.
+
+Probes are shared by all threads in a process (including newly created
+threads).
+
+If a probed process exits or execs, Uprobes automatically unregisters
+all uprobes associated with that process. Subsequent attempts to
+unregister these probes will be treated as no-ops.
+
+On the other hand, if a probed memory area is removed from the
+process's virtual memory map (e.g., via dlclose(3) or munmap(2)),
+it's currently up to you to unregister the probes first.
+
+There is no way to specify that probes should be inherited across fork;
+Uprobes removes all probepoints in the newly created child process.
+See Section 7, "Interoperation with Utrace", for more information on
+this topic.
+
+On at least some architectures, Uprobes makes no attempt to verify
+that the probe address you specify actually marks the start of an
+instruction. If you get this wrong, chaos may ensue.
+
+To avoid interfering with interactive debuggers, Uprobes will refuse
+to insert a probepoint where a breakpoint instruction already exists,
+unless it was Uprobes that put it there. Some architectures may
+refuse to insert probes on other types of instructions.
+
+If you install a probe in an inline-able function, Uprobes makes
+no attempt to chase down all inline instances of the function and
+install probes there. gcc may inline a function without being asked,
+so keep this in mind if you're not seeing the probe hits you expect.
+
+A probe handler can modify the environment of the probed function
+-- e.g., by modifying data structures, or by modifying the
+contents of the pt_regs struct (which are restored to the registers
+upon return from the breakpoint). So Uprobes can be used, for example,
+to install a bug fix or to inject faults for testing. Uprobes, of
+course, has no way to distinguish the deliberately injected faults
+from the accidental ones. Don't drink and probe.
+
+When you register the first probe at probepoint or unregister the
+last probe probe at a probepoint, Uprobes asks Utrace to "quiesce"
+the probed process so that Uprobes can insert or remove the breakpoint
+instruction. If the process is not already stopped, Utrace stops it.
+If the process is running an interruptible system call, this may cause
+the system call to finish early or fail with EINTR. (The PTRACE_ATTACH
+request of the ptrace system call has this same limitation.)
+
+When Uprobes establishes a probepoint on a previous unprobed page
+of text, Linux creates a new copy of the page via its copy-on-write
+mechanism. When probepoints are removed, Uprobes makes no attempt
+to consolidate identical copies of the same page. This could affect
+memory availability if you probe many, many pages in many, many
+long-running processes.
+
+6. Interoperation with Kprobes
+
+Uprobes is intended to interoperate usefully with Kprobes (see
+Documentation/kprobes.txt). For example, an instrumentation module
+can make calls to both the Kprobes API and the Uprobes API.
+
+A uprobe handler can register or unregister kprobes, jprobes,
+and kretprobes. On the other hand, a kprobe, jprobe, or kretprobe
+handler must not sleep, and therefore cannot register or unregister
+any of these types of probes. (Ideas for removing this restriction
+are welcome.)
+
+Note that the overhead of a uprobe hit is several times that of a
+kprobe hit.
+
+7. Interoperation with Utrace
+
+As mentioned in Section 1.2, Uprobes is a client of Utrace. For each
+probed thread, Uprobes establishes a Utrace engine, and registers
+callbacks for the following types of events: clone/fork, exec, exit,
+and "core-dump" signals (which include breakpoint traps). Uprobes
+establishes this engine when the process is first probed, or when
+Uprobes is notified of the thread's creation, whichever comes first.
+
+An instrumentation module can use both the Utrace and Uprobes APIs (as
+well as Kprobes). When you do this, keep the following facts in mind:
+
+- For a particular event, Utrace callbacks are called in the order in
+which the engines are established. Utrace does not currently provide
+a mechanism for altering this order.
+
+- When Uprobes learns that a probed process has forked, it removes
+the breakpoints in the child process.
+
+- When Uprobes learns that a probed process has exec-ed or exited,
+it disposes of its data structures for that process (first allowing
+any outstanding [un]registration operations to terminate).
+
+- When a probed thread hits a breakpoint or completes single-stepping
+of a probed instruction, engines with the UTRACE_EVENT(SIGNAL_CORE)
+flag set are notified. The Uprobes signal callback prevents (via
+UTRACE_ACTION_HIDE) this event from being reported to engines later
+in the list. But if your engine was established before Uprobes's,
+you will see this this event.
+
+If you want to establish probes in a newly forked child, you can use
+the following procedure: // TODO: Test this.
+
+- Register a report_clone callback with Utrace. In this callback,
+the CLONE_THREAD flag distinguishes between the creation of a new
+thread vs. a new process.
+
+- In your report_clone callback, call utrace_attach() to attach to
+the child process, and set the engine's UTRACE_ACTION_QUIESCE flag.
+The child process will quiesce at a point where it is ready to
+be probed.
+
+- In your report_quiesce callback, register the desired probes.
+(Note that you cannot use the same probe object for both parent
+and child. If you want to duplicate the probepoints, you must
+create a new set of uprobe objects.)
+
+8. Probe Overhead
+
+// TODO: Adjust as other architectures are tested.
+On a typical CPU in use in 2007, a uprobe hit takes 3 to 4
+microseconds to process. Specifically, a benchmark that hits the same
+probepoint repeatedly, firing a simple handler each time, reports
+250,000 to 300,000 hits per second, depending on the architecture.
+
+Here are sample overhead figures (in usec) for different architectures.
+
+i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
+3.4 usec
+
+x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
+// TODO
+
+ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
+// TODO
+
+9. TODO
+
+a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
+programming interface for probe-based instrumentation. SystemTap
+already supports kernel probes. It could exploit Uprobes as well.
+b. Support for other architectures.
+
+10. Uprobes Team
+
+The following people have made major contributions to Uprobes:
+Jim Keniston - jkenisto@us.ibm.com
+Ananth Mavinakayanahalli - ananth@in.ibm.com
+Prasanna Panchamukhi - prasanna@in.ibm.com
+Dave Wilder - dwilder@us.ibm.com
+
+11. Uprobes Example
+
+Here's a sample kernel module showing the use of Uprobes to count the
+number of times an instruction at a particular address is executed,
+and optionally (unless verbose=0) report each time it's executed.
+----- cut here -----
+/* uprobe_example.c */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/uprobes.h>
+
+/*
+ * Usage: insmod uprobe_example.ko pid=<pid> vaddr=<address> [verbose=0]
+ * where <pid> identifies the probed process and <address> is the virtual
+ * address of the probed instruction.
+ */
+
+static int pid = 0;
+module_param(pid, int, 0);
+MODULE_PARM_DESC(pid, "pid");
+
+static int verbose = 1;
+module_param(verbose, int, 0);
+MODULE_PARM_DESC(verbose, "verbose");
+
+static long vaddr = 0;
+module_param(vaddr, long, 0);
+MODULE_PARM_DESC(vaddr, "vaddr");
+
+static int nhits;
+static struct uprobe usp;
+
+static void uprobe_handler(struct uprobe *u, struct pt_regs *regs)
+{
+ nhits++;
+ if (verbose)
+ printk(KERN_INFO "Hit #%d on probepoint at %#lx\n",
+ nhits, u->vaddr);
+}
+
+int init_module(void)
+{
+ int ret;
+ usp.pid = pid;
+ usp.vaddr = vaddr;
+ usp.handler = uprobe_handler;
+ printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n",
+ usp.pid, usp.vaddr);
+ ret = register_uprobe(&usp);
+ if (ret != 0) {
+ printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret);
+ return -1;
+ }
+ return 0;
+}
+
+void cleanup_module(void)
+{
+ printk(KERN_INFO "Unregistering uprobe on pid %d, vaddr %#lx\n",
+ usp.pid, usp.vaddr);
+ printk(KERN_INFO "Probepoint was hit %d times\n", nhits);
+ unregister_uprobe(&usp);
+}
+MODULE_LICENSE("GPL");
+----- cut here -----
+
+You can build the kernel module, uprobe_example.ko, using the following
+Makefile:
+----- cut here -----
+obj-m := uprobe_example.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+ $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+ rm -f *.mod.c *.ko *.o .*.cmd
+ rm -rf .tmp_versions
+----- cut here -----
+
+For example, if you want to run myprog and monitor its calls to myfunc(),
+you can do the following:
+
+$ make // Build the uprobe_example module.
+...
+$ nm -p myprog | awk '$3=="myfunc"'
+080484a8 T myfunc
+$ ./myprog &
+$ ps
+ PID TTY TIME CMD
+ 4367 pts/3 00:00:00 bash
+ 8156 pts/3 00:00:00 myprog
+ 8157 pts/3 00:00:00 ps
+$ su -
+...
+# insmod uprobe_example.ko pid=8156 vaddr=0x080484a8
+
+In /var/log/messages and on the console, you will see a message of the
+form "kernel: Hit #1 on probepoint at 0x80484a8" each time myfunc()
+is called. To turn off probing, remove the module:
+
+# rmmod uprobe_example
+
+In /var/log/messages and on the console, you will see a message of the
+form "Probepoint was hit 5 times".
diff -puN arch/i386/Kconfig~1-uprobes-base arch/i386/Kconfig
--- linux-2.6.21-rc6/arch/i386/Kconfig~1-uprobes-base 2007-04-20 09:26:24.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/Kconfig 2007-04-20 09:27:22.000000000 -0700
@@ -1231,6 +1231,16 @@ config KPROBES
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".
+config UPROBES
+ bool "User-space probes (EXPERIMENTAL)"
+ depends on UTRACE && EXPERIMENTAL && MODULES
+ help
+ Uprobes allows kernel modules to establish probepoints
+ in user applications and execute handler functions when
+ the probepoints are hit. For more information, refer to
+ Documentation/uprobes.txt.
+ If in doubt, say "N".
+
source "kernel/Kconfig.marker"
endmenu
diff -puN /dev/null include/asm-i386/uprobes.h
--- /dev/null 2007-04-20 10:18:59.502086278 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/uprobes.h 2007-04-20 09:27:22.000000000 -0700
@@ -0,0 +1,54 @@
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+/*
+ * Userspace Probes (UProbes)
+ * include/asm-i386/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+typedef u8 uprobe_opcode_t;
+#define BREAKPOINT_INSTRUCTION 0xcc
+#define BP_INSN_SIZE 1
+#define MAX_UINSN_BYTES 16
+#define SLOT_IP 12 /* instruction pointer slot from include/asm/elf.h */
+
+/* Architecture specific switch for where the IP points after a bp hit */
+#define ARCH_BP_INST_PTR(inst_ptr) (inst_ptr - BP_INSN_SIZE)
+
+struct uprobe_kimg;
+
+/* Caller prohibits probes on int3. We currently allow everything else. */
+static inline int arch_validate_probed_insn(struct uprobe_kimg *uk)
+{
+ return 0;
+}
+
+/* On i386, the int3 traps leaves eip pointing past the int3 instruction. */
+static inline unsigned long arch_get_probept(struct pt_regs *regs)
+{
+ return (unsigned long) (regs->eip - BP_INSN_SIZE);
+}
+
+static inline void arch_reset_ip_for_sstep(struct pt_regs *regs)
+{
+ regs->eip -= BP_INSN_SIZE;
+}
+
+#endif /* _ASM_UPROBES_H */
diff -puN /dev/null include/linux/uprobes.h
--- /dev/null 2007-04-20 10:18:59.502086278 -0700
+++ linux-2.6.21-rc6-jimk/include/linux/uprobes.h 2007-04-20 09:27:22.000000000 -0700
@@ -0,0 +1,242 @@
+#ifndef _LINUX_UPROBES_H
+#define _LINUX_UPROBES_H
+/*
+ * Userspace Probes (UProbes)
+ * include/linux/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/list.h>
+#include <linux/smp.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
+#include <linux/kref.h>
+
+struct pt_regs;
+struct task_struct;
+struct utrace_attached_engine;
+struct uprobe_kimg;
+struct uprobe;
+
+/*
+ * This is what the user supplies us.
+ */
+struct uprobe {
+ /*
+ * The pid of the probed process. Currently, this can be the
+ * thread ID (task->pid) of any active thread in the process.
+ */
+ pid_t pid;
+
+ /* location of the probe point */
+ unsigned long vaddr;
+
+ /* Handler to run when the probepoint is hit */
+ void (*handler)(struct uprobe*, struct pt_regs*);
+
+ /* Subsequent members are for internal use only. */
+
+ /*
+ * -EBUSY while we're waiting for all threads to quiesce so the
+ * associated breakpoint can be inserted or removed.
+ * 0 if the the insert/remove operation has succeeded, or -errno
+ * otherwise.
+ */
+ volatile int status;
+
+ /* All uprobes with this pid and vaddr map to uk. */
+ struct uprobe_kimg *uk;
+
+ /* on uprobe_kimg's list */
+ struct list_head list;
+
+ /* This simplifies mapping uprobe to uprobe_process. */
+ pid_t tgid;
+};
+
+#ifdef CONFIG_UPROBES
+#include <asm/uprobes.h>
+
+enum uprobe_state {
+ UPROBE_INSERTING, // process quiescing prior to insertion
+ UPROBE_BP_SET, // breakpoint in place
+ UPROBE_REMOVING, // process quiescing prior to removal
+ UPROBE_DISABLED, // removal completed
+ UPROBE_FREEING // being deallocated
+};
+
+enum uprobe_task_state {
+ UPTASK_QUIESCENT,
+ UPTASK_SLEEPING, // used when task may not be able to quiesce
+ UPTASK_RUNNING,
+ UPTASK_BP_HIT,
+ UPTASK_SSTEP_AFTER_BP
+};
+
+#define UPROBE_HASH_BITS 5
+#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS)
+
+/*
+ * uprobe_process -- not a user-visible struct.
+ * A uprobe_process represents a probed process. A process can have
+ * multiple probepoints (each represented by a uprobe_kimg) and
+ * one or more threads (each represented by a uprobe_task).
+ */
+struct uprobe_process {
+ /*
+ * Unless otherwise noted, fields in uprobe_process are guarded
+ * by this mutex.
+ */
+ struct mutex mutex;
+
+ /* Table of uprobe_kimgs registered for this process */
+ /* TODO: Switch to list_head[] per Ingo. */
+ struct hlist_head uprobe_table[UPROBE_TABLE_SIZE];
+ int nuk; /* number of uprobe_kimgs */
+ /*
+ * Guards uprobe_table[], which we search every time we hit a
+ * probepoint.
+ */
+ struct rw_semaphore utable_rwsem;
+
+ /* List of uprobe_kimgs awaiting insertion or removal */
+ struct list_head pending_uprobes;
+
+ /* List of uprobe_tasks in this task group */
+ struct list_head thread_list;
+ int nthreads;
+ int n_quiescent_threads;
+
+ /* this goes on the uproc_table */
+ struct hlist_node hlist;
+
+ /*
+ * All threads (tasks) in a process share the same uprobe_process.
+ */
+ pid_t tgid;
+
+ /* Threads in SLEEPING state wait here to be roused. */
+ wait_queue_head_t waitq;
+
+ /*
+ * We won't free the uprobe_process while...
+ * - any register/unregister operations on it are in progress; or
+ * - uprobe_table[] is not empty; or
+ * - any tasks are SLEEPING in the waitq.
+ * refcount reflects this. We do NOT ref-count tasks (threads),
+ * since once the last thread has exited, the rest is academic.
+ */
+ struct kref refcount;
+};
+
+/*
+ * uprobe_kimg -- not a user-visible struct.
+ * Abstraction to store kernel's internal uprobe data.
+ * Corresponds to a probepoint, at which several uprobes can be registered.
+ */
+struct uprobe_kimg {
+ /*
+ * Object is read-locked to run handlers so that multiple threads
+ * in a process can run handlers for same probepoint simultaneously.
+ */
+ struct rw_semaphore rwsem;
+
+ /* vaddr copied from (first) uprobe */
+ unsigned long vaddr;
+
+ /* The uprobe(s) associated with this uprobe_kimg */
+ struct list_head uprobe_list;
+
+ volatile enum uprobe_state state;
+
+ /* Saved opcode (which has been replaced with breakpoint) */
+ uprobe_opcode_t opcode;
+
+ /* Saved original instruction */
+ uprobe_opcode_t insn[MAX_UINSN_BYTES / sizeof(uprobe_opcode_t)];
+
+ /* The corresponding struct uprobe_process */
+ struct uprobe_process *uproc;
+
+ /*
+ * uk goes in the uprobe_process->uprobe_table when registered --
+ * even before the breakpoint has been inserted.
+ */
+ struct hlist_node ut_node;
+
+ /*
+ * uk sits in the uprobe_process->pending_uprobes queue while
+ * awaiting insertion or removal of the breakpoint.
+ */
+ struct list_head pd_node;
+
+ /* [un]register_uprobe() waits 'til bkpt inserted/removed. */
+ wait_queue_head_t waitq;
+};
+
+/*
+ * uprobe_utask -- not a user-visible struct.
+ * Corresponds to a thread in a probed process.
+ */
+struct uprobe_task {
+ /* Lives on the thread_list for the uprobe_process */
+ struct list_head list;
+
+ /* This is a back pointer to the task_struct for this task */
+ struct task_struct *tsk;
+
+ /* The utrace engine for this task */
+ struct utrace_attached_engine *engine;
+
+ /* Back pointer to the associated uprobe_process */
+ struct uprobe_process *uproc;
+
+ volatile enum uprobe_task_state state;
+
+ /*
+ * quiescing = 1 means this task has been asked to quiesce.
+ * It may not be able to comply immediately if it's hit a bkpt.
+ */
+ volatile int quiescing;
+
+ /* Saved address of copied original instruction */
+ long singlestep_addr;
+
+ /* Task currently running quiesce_all_threads() */
+ struct task_struct *quiesce_master;
+
+ /* Set before running handlers; cleared after single-stepping. */
+ struct uprobe_kimg *active_probe;
+
+ struct mutex mutex;
+};
+
+int register_uprobe(struct uprobe *u);
+void unregister_uprobe(struct uprobe *u);
+
+#else /* CONFIG_UPROBES */
+
+static inline int register_uprobe(struct uprobe *u)
+{
+ return -ENOSYS;
+}
+static inline void unregister_uprobe(struct uprobe *u)
+{
+}
+#endif /* CONFIG_UPROBES */
+#endif /* _LINUX_UPROBES_H */
diff -puN kernel/Makefile~1-uprobes-base kernel/Makefile
--- linux-2.6.21-rc6/kernel/Makefile~1-uprobes-base 2007-04-20 09:26:24.000000000 -0700
+++ linux-2.6.21-rc6-jimk/kernel/Makefile 2007-04-20 09:27:22.000000000 -0700
@@ -55,6 +55,7 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayac
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
obj-$(CONFIG_UTRACE) += utrace.o
obj-$(CONFIG_PTRACE) += ptrace.o
+obj-$(CONFIG_UPROBES) += uprobes.o
ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -puN /dev/null kernel/uprobes.c
--- /dev/null 2007-04-20 10:18:59.502086278 -0700
+++ linux-2.6.21-rc6-jimk/kernel/uprobes.c 2007-04-20 09:27:22.000000000 -0700
@@ -0,0 +1,1370 @@
+/*
+ * Userspace Probes (UProbes)
+ * kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/rcupdate.h>
+#include <linux/err.h>
+#include <linux/kref.h>
+#include <linux/utrace.h>
+#include <linux/uprobes.h>
+#include <linux/tracehook.h>
+#include <asm/tracehook.h>
+#include <asm/errno.h>
+
+#define SET_ENGINE_FLAGS 1
+#define CLEAR_ENGINE_FLAGS 0
+
+extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
+ void *buf, int len, int write);
+
+/*
+ * Locking hierarchy:
+ * uproc_mutex
+ * uprobe_process->mutex
+ * uprobe_task->mutex
+ * uprobe_kimg->rwsem
+ * E.g., don't unconditionally grab uprobe_process->mutex while holding
+ * uprobe_task->mutex.
+ */
+
+/* Table of currently probed processes, hashed by tgid. */
+static struct hlist_head uproc_table[UPROBE_TABLE_SIZE];
+
+/*
+ * Protects uproc_table during uprobe (un)registration, initiated
+ * either by user or internally.
+ */
+static DEFINE_MUTEX(uproc_mutex);
+
+/* p_uprobe_utrace_ops = &uprobe_utrace_ops. Fwd refs are a pain w/o this. */
+static const struct utrace_engine_ops *p_uprobe_utrace_ops;
+
+static inline void uprobe_get_process(struct uprobe_process *uproc)
+{
+ kref_get(&uproc->refcount);
+}
+
+static void uprobe_release_process(struct kref *kref);
+
+static inline int uprobe_put_process(struct uprobe_process *uproc)
+{
+ return kref_put(&uproc->refcount, uprobe_release_process);
+}
+
+/* Runs with the uproc_mutex held. Returns with uproc ref-counted. */
+struct uprobe_process *uprobe_find_process(pid_t tgid)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct uprobe_process *uproc;
+
+ head = &uproc_table[hash_long(tgid, UPROBE_HASH_BITS)];
+ hlist_for_each_entry(uproc, node, head, hlist) {
+ if (uproc->tgid == tgid) {
+ uprobe_get_process(uproc);
+ return uproc;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * In the given uproc's hash table of uprobes, find the one with the
+ * specified virtual address.
+ * If lock == 0, returns with uk unlocked.
+ * If lock == 1, returns with uk read-locked.
+ * If lock == 2, returns with uk write-locked.
+ * Runs with uproc->utable_rwsem locked.
+ */
+struct uprobe_kimg *find_uprobe(struct uprobe_process *uproc,
+ unsigned long vaddr, int lock)
+{
+ struct uprobe_kimg *uk;
+ struct hlist_node *node;
+ struct hlist_head *head = &uproc->uprobe_table[hash_long(vaddr,
+ UPROBE_HASH_BITS)];
+
+ hlist_for_each_entry(uk, node, head, ut_node) {
+ if (uk->vaddr == vaddr && uk->state != UPROBE_FREEING
+ && uk->state != UPROBE_DISABLED) {
+ if (lock == 1)
+ down_read(&uk->rwsem);
+ else if (lock == 2)
+ down_write(&uk->rwsem);
+ return uk;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * set_bp: Store a breakpoint instruction at uk->vaddr.
+ * Returns BP_INSN_SIZE on success.
+ *
+ * NOTE: BREAKPOINT_INSTRUCTION on all archs is the same size as
+ * uprobe_opcode_t.
+ */
+static int set_bp(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+ uprobe_opcode_t bp_insn = BREAKPOINT_INSTRUCTION;
+ return access_process_vm(tsk, uk->vaddr, &bp_insn, BP_INSN_SIZE, 1);
+}
+
+/*
+ * set_orig_insn: For probepoint uk, replace the breakpoint instruction
+ * with the original opcode. Returns BP_INSN_SIZE on success.
+ */
+static int set_orig_insn(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+ return access_process_vm(tsk, uk->vaddr, &uk->opcode, BP_INSN_SIZE, 1);
+}
+
+static void bkpt_insertion_failed(struct uprobe_kimg *uk, const char *why)
+{
+ printk(KERN_ERR "Can't place uprobe at pid %d vaddr %#lx: %s\n",
+ uk->uproc->tgid, uk->vaddr, why);
+}
+
+/*
+ * Save a copy of the original instruction (so it can be single-stepped
+ * out of line), insert the breakpoint instruction, and awake
+ * register_uprobe().
+ * Runs with uk write-locked.
+ */
+static void insert_bkpt(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+ struct uprobe *u;
+ long result = 0;
+ int len;
+
+ if (!tsk) {
+ /* No surviving tasks associated with uk->uproc */
+ result = -ESRCH;
+ goto out;
+ }
+
+ /*
+ * If access_process_vm() transfers fewer bytes than the maximum
+ * instruction size, assume that the probed instruction is smaller
+ * than the max and near the end of the last page of instructions.
+ * But there must be room at least for a breakpoint-size instruction.
+ */
+ len = access_process_vm(tsk, uk->vaddr, uk->insn, MAX_UINSN_BYTES, 0);
+ if (len < BP_INSN_SIZE) {
+ bkpt_insertion_failed(uk, "error reading original instruction");
+ result = -EIO;
+ goto out;
+ }
+ memcpy(&uk->opcode, uk->insn, BP_INSN_SIZE);
+ if (uk->opcode == BREAKPOINT_INSTRUCTION) {
+ bkpt_insertion_failed(uk, "bkpt already exists at that addr");
+ result = -EEXIST;
+ goto out;
+ }
+
+ if ((result = arch_validate_probed_insn(uk)) < 0) {
+ bkpt_insertion_failed(uk, "instruction type cannot be probed");
+ goto out;
+ }
+
+ len = set_bp(uk, tsk);
+ if (len < BP_INSN_SIZE) {
+ bkpt_insertion_failed(uk, "failed to insert bkpt instruction");
+ result = -EIO;
+ goto out;
+ }
+out:
+ uk->state = (result ? UPROBE_DISABLED : UPROBE_BP_SET);
+ list_for_each_entry(u, &uk->uprobe_list, list)
+ u->status = result;
+ wake_up_all(&uk->waitq);
+}
+
+/* Runs with uk write-locked. */
+static void remove_bkpt(struct uprobe_kimg *uk, struct task_struct *tsk)
+{
+ int len;
+
+ if (tsk) {
+ len = set_orig_insn(uk, tsk);
+ if (len < BP_INSN_SIZE) {
+ printk(KERN_ERR
+ "Error removing uprobe at pid %d vaddr %#lx:"
+ " can't restore original instruction\n",
+ tsk->tgid, uk->vaddr);
+ /*
+ * This shouldn't happen, since we were previously
+ * able to write the breakpoint at that address.
+ * There's not much we can do besides let the
+ * process die with a SIGTRAP the next time the
+ * breakpoint is hit.
+ */
+ }
+ }
+ /* Wake up unregister_uprobe(). */
+ uk->state = UPROBE_DISABLED;
+ wake_up_all(&uk->waitq);
+}
+
+/*
+ * Runs with all of uproc's threads quiesced and uproc->mutex held.
+ * As specified, insert or remove the breakpoint instruction for each
+ * uprobe_kimg on uproc's pending list.
+ * tsk = one of the tasks associated with uproc -- NULL if there are
+ * no surviving threads.
+ * It's OK for uproc->pending_uprobes to be empty here. It can happen
+ * if a register and an unregister are requested (by different probers)
+ * simultaneously for the same pid/vaddr.
+ * Note that the current task may be a thread in uproc, or it may be
+ * a task running [un]register_uprobe().
+ */
+static void handle_pending_uprobes(struct uprobe_process *uproc,
+ struct task_struct *tsk)
+{
+ struct uprobe_kimg *uk, *tmp;
+
+ list_for_each_entry_safe(uk, tmp, &uproc->pending_uprobes, pd_node) {
+ down_write(&uk->rwsem);
+ switch (uk->state) {
+ case UPROBE_INSERTING:
+ insert_bkpt(uk, tsk);
+ break;
+ case UPROBE_REMOVING:
+ remove_bkpt(uk, tsk);
+ break;
+ default:
+ BUG();
+ }
+ list_del(&uk->pd_node);
+ up_write(&uk->rwsem);
+ }
+}
+
+static void utask_adjust_flags(struct uprobe_task *utask, int set,
+ unsigned long flags)
+{
+ unsigned long newflags, oldflags;
+
+ newflags = oldflags = utask->engine->flags;
+
+ if (set)
+ newflags |= flags;
+ else
+ newflags &= ~flags;
+
+ if (newflags != oldflags)
+ utrace_set_flags(utask->tsk, utask->engine, newflags);
+}
+
+/* Opposite of quiesce_all_threads(). Same locking applies. */
+static void rouse_all_threads(struct uprobe_process *uproc)
+{
+ struct uprobe_task *utask;
+
+ list_for_each_entry(utask, &uproc->thread_list, list) {
+ mutex_lock(&utask->mutex);
+ if (utask->quiescing) {
+ utask->quiescing = 0;
+ if (utask->state == UPTASK_QUIESCENT) {
+ utask_adjust_flags(utask, CLEAR_ENGINE_FLAGS,
+ UTRACE_ACTION_QUIESCE |
+ UTRACE_EVENT(QUIESCE));
+ utask->state = UPTASK_RUNNING;
+ uproc->n_quiescent_threads--;
+ }
+ }
+ mutex_unlock(&utask->mutex);
+ }
+ /* Wake any threads that decided to sleep rather than quiesce. */
+ wake_up_all(&uproc->waitq);
+}
+
+/*
+ * If all of uproc's surviving threads have quiesced, do the necessary
+ * breakpoint insertions or removals and then un-quiesce everybody.
+ * tsk is a surviving thread, or NULL if there is none. Runs with
+ * uproc->mutex held.
+ */
+static void check_uproc_quiesced(struct uprobe_process *uproc,
+ struct task_struct *tsk)
+{
+ if (uproc->n_quiescent_threads >= uproc->nthreads) {
+ handle_pending_uprobes(uproc, tsk);
+ rouse_all_threads(uproc);
+ }
+}
+
+/*
+ * Quiesce all threads in the specified process -- e.g., prior to
+ * breakpoint insertion. Runs with uproc->mutex held.
+ * Returns the number of threads that haven't died yet.
+ */
+static int quiesce_all_threads(struct uprobe_process *uproc)
+{
+ struct uprobe_task *utask;
+ struct task_struct *survivor = NULL; // any survivor
+ int survivors = 0;
+
+ list_for_each_entry(utask, &uproc->thread_list, list) {
+ mutex_lock(&utask->mutex);
+ survivor = utask->tsk;
+ survivors++;
+ if (!utask->quiescing) {
+ /*
+ * If utask is currently handling a probepoint, it'll
+ * check utask->quiescing and quiesce when it's done.
+ */
+ utask->quiescing = 1;
+ if (utask->state == UPTASK_RUNNING) {
+ utask->quiesce_master = current;
+ utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+ UTRACE_ACTION_QUIESCE
+ | UTRACE_EVENT(QUIESCE));
+ utask->quiesce_master = NULL;
+ }
+ }
+ mutex_unlock(&utask->mutex);
+ }
+ /*
+ * If any task was already quiesced (in utrace's opinion) when we
+ * called utask_adjust_flags() on it, uprobe_report_quiesce() was
+ * called, but wasn't in a position to call check_uproc_quiesced().
+ */
+ check_uproc_quiesced(uproc, survivor);
+ return survivors;
+}
+
+/* Runs with uproc_mutex and uproc->mutex held. */
+static void uprobe_free_process(struct uprobe_process *uproc)
+{
+ struct uprobe_task *utask, *tmp;
+
+ if (!hlist_unhashed(&uproc->hlist))
+ hlist_del(&uproc->hlist);
+
+ list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) {
+ /* Give any last report_* callback a chance to complete. */
+ mutex_lock(&utask->mutex);
+ /*
+ * utrace_detach() is OK here (required, it seems) even if
+ * utask->tsk == current and we're in a utrace callback.
+ */
+ if (utask->engine)
+ utrace_detach(utask->tsk, utask->engine);
+ mutex_unlock(&utask->mutex);
+ kfree(utask);
+ }
+ mutex_unlock(&uproc->mutex); // So kfree doesn't complain
+ kfree(uproc);
+}
+
+/* Uproc's ref-count has dropped to zero. Free everything. */
+static void uprobe_release_process(struct kref *ref)
+{
+ struct uprobe_process *uproc = container_of(ref, struct uprobe_process,
+ refcount);
+ mutex_lock(&uproc_mutex);
+ mutex_lock(&uproc->mutex);
+ uprobe_free_process(uproc);
+ mutex_unlock(&uproc_mutex);
+}
+
+/*
+ * Allocate a uprobe_task object for t and add it to uproc's list.
+ * Called with t "got" and uproc->mutex locked. Called in one of
+ * the following cases:
+ * - before setting the first uprobe in t's process
+ * - we're in uprobe_report_clone() and t is the newly added thread
+ * Returns:
+ * - pointer to new uprobe_task on success
+ * - NULL if t dies before we can utrace_attach it
+ * - negative errno otherwise
+ */
+static struct uprobe_task *uprobe_add_task(struct task_struct *t,
+ struct uprobe_process *uproc)
+{
+ struct uprobe_task *utask;
+ struct utrace_attached_engine *engine;
+
+ utask = (struct uprobe_task *)kzalloc(sizeof *utask, GFP_USER);
+ if (unlikely(utask == NULL))
+ return ERR_PTR(-ENOMEM);
+
+ mutex_init(&utask->mutex);
+ mutex_lock(&utask->mutex);
+ utask->tsk = t;
+ utask->state = UPTASK_RUNNING;
+ utask->quiescing = 0;
+ utask->uproc = uproc;
+ utask->active_probe = NULL;
+
+ engine = utrace_attach(t, UTRACE_ATTACH_CREATE, p_uprobe_utrace_ops,
+ utask);
+ if (IS_ERR(engine)) {
+ long err = PTR_ERR(engine);
+ printk("uprobes: utrace_attach failed, returned %ld\n", err);
+ mutex_unlock(&utask->mutex);
+ kfree(utask);
+ if (err == -ESRCH)
+ return NULL;
+ return ERR_PTR(err);
+ }
+ utask->engine = engine;
+ /*
+ * Always watch for traps, clones, execs and exits. Caller must
+ * set any other engine flags.
+ */
+ utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+ UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(EXEC) |
+ UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT));
+ INIT_LIST_HEAD(&utask->list);
+ list_add_tail(&utask->list, &uproc->thread_list);
+ /*
+ * Note that it's OK if t dies just after utrace_attach, because
+ * with the engine in place, the appropriate report_* callback
+ * should handle it after we release uprobe->mutex.
+ */
+ mutex_unlock(&utask->mutex);
+ return utask;
+}
+
+/* See comment in uprobe_mk_process(). */
+static struct task_struct *find_next_thread_to_add(struct uprobe_process *uproc, struct task_struct *start)
+{
+ struct task_struct *t;
+ struct uprobe_task *utask;
+
+ read_lock(&tasklist_lock);
+ t = start;
+ do {
+ list_for_each_entry(utask, &uproc->thread_list, list) {
+ if (utask->tsk == t)
+ goto t_already_added;
+ }
+ /* Found thread/task to add. */
+ get_task_struct(t); // OK to nest if t=p.
+ read_unlock(&tasklist_lock);
+ return t;
+t_already_added:
+ t = next_thread(t);
+ } while (t != start);
+
+ read_unlock(&tasklist_lock);
+ return NULL;
+}
+
+/* Runs with uproc_mutex held; returns with uproc->mutex held. */
+static struct uprobe_process *uprobe_mk_process(struct task_struct *p)
+{
+ struct uprobe_process *uproc;
+ struct uprobe_task *utask;
+ struct task_struct *add_me;
+ int i;
+ long err;
+
+ uproc = (struct uprobe_process *)kzalloc(sizeof *uproc, GFP_USER);
+ if (unlikely(uproc == NULL))
+ return ERR_PTR(-ENOMEM);
+
+ /* Initialize fields */
+ kref_init(&uproc->refcount);
+ mutex_init(&uproc->mutex);
+ mutex_lock(&uproc->mutex);
+ init_waitqueue_head(&uproc->waitq);
+ for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+ INIT_HLIST_HEAD(&uproc->uprobe_table[i]);
+ uproc->nuk = 0;
+ init_rwsem(&uproc->utable_rwsem);
+ INIT_LIST_HEAD(&uproc->pending_uprobes);
+ INIT_LIST_HEAD(&uproc->thread_list);
+ uproc->nthreads = 0;
+ uproc->n_quiescent_threads = 0;
+ INIT_HLIST_NODE(&uproc->hlist);
+ uproc->tgid = p->tgid;
+
+ /*
+ * Create and populate one utask per thread in this process. We
+ * can't call uprobe_add_task() while holding tasklist_lock, so we:
+ * 1. Lock task list.
+ * 2. Find the next task, add_me, in this process that's not
+ * already on uproc's thread_list. (Start search at previous
+ * one found.)
+ * 3. Unlock task list.
+ * 4. uprobe_add_task(add_me, uproc)
+ * Repeat 1-4 'til we have utasks for all tasks.
+ */
+ add_me = p;
+ while ((add_me = find_next_thread_to_add(uproc, add_me)) != NULL) {
+ utask = uprobe_add_task(add_me, uproc);
+ put_task_struct(add_me);
+ if (IS_ERR(utask)) {
+ err = PTR_ERR(utask);
+ goto fail;
+ }
+ if (utask)
+ uproc->nthreads++;
+ }
+
+ if (uproc->nthreads == 0) {
+ /* All threads -- even p -- are dead. */
+ err = -ESRCH;
+ goto fail;
+ }
+ return uproc;
+
+fail:
+ uprobe_free_process(uproc);
+ return ERR_PTR(err);
+}
+
+/*
+ * Creates a uprobe_kimg and connects it to u and uproc. Runs with
+ * uproc->mutex and uproc->utable_rwsem locked. Returns with uprobe_kimg
+ * unlocked.
+ */
+static struct uprobe_kimg *uprobe_add_kimg(struct uprobe *u,
+ struct uprobe_process *uproc)
+{
+ struct uprobe_kimg *uk;
+
+ uk = (struct uprobe_kimg *)kzalloc(sizeof *uk, GFP_USER);
+ if (unlikely(uk == NULL))
+ return ERR_PTR(-ENOMEM);
+ init_rwsem(&uk->rwsem);
+ down_write(&uk->rwsem);
+ init_waitqueue_head(&uk->waitq);
+
+ /* Connect to u. */
+ INIT_LIST_HEAD(&uk->uprobe_list);
+ list_add_tail(&u->list, &uk->uprobe_list);
+ u->uk = uk;
+ u->status = -EBUSY;
+ uk->vaddr = u->vaddr;
+
+ /* Connect to uproc. */
+ uk->state = UPROBE_INSERTING;
+ uk->uproc = uproc;
+ INIT_LIST_HEAD(&uk->pd_node);
+ list_add_tail(&uk->pd_node, &uproc->pending_uprobes);
+ INIT_HLIST_NODE(&uk->ut_node);
+ hlist_add_head(&uk->ut_node,
+ &uproc->uprobe_table[hash_long(uk->vaddr, UPROBE_HASH_BITS)]);
+ up_write(&uk->rwsem);
+ uproc->nuk++;
+ uprobe_get_process(uproc);
+ return uk;
+}
+
+/*
+ * Free uk. Called with uk->rwsem write-locked, and uproc->utable_rwsem
+ * locked if necessary.
+ */
+static void uprobe_free_kimg_locked(struct uprobe_kimg *uk)
+{
+ hlist_del(&uk->ut_node);
+ uk->uproc->nuk--;
+ up_write(&uk->rwsem);
+ kfree(uk);
+}
+
+/*
+ * Called with uk write-locked. Frees uk and decrements the ref-count
+ * on uk->uproc.
+ */
+static void uprobe_free_kimg(struct uprobe_kimg *uk)
+{
+ struct uprobe_process *uproc = uk->uproc;
+
+ /* Come down through the top to preserve lock order. */
+ uk->state = UPROBE_FREEING;
+ up_write(&uk->rwsem);
+
+ down_write(&uproc->utable_rwsem);
+ down_write(&uk->rwsem); // So other CPUs have time to see UPROBE_FREEING
+ uprobe_free_kimg_locked(uk);
+ up_write(&uproc->utable_rwsem);
+ uprobe_put_process(uproc);
+}
+
+/* Note that we never free u, because the user owns that. */
+static void purge_uprobe(struct uprobe *u)
+{
+ struct uprobe_kimg *uk;
+
+ uk = u->uk;
+ down_write(&uk->rwsem);
+ list_del(&u->list);
+ u->uk = NULL;
+ if (list_empty(&uk->uprobe_list)) {
+ uprobe_free_kimg(uk);
+ return;
+ }
+ up_write(&uk->rwsem);
+}
+
+/*
+ * See Documentation/uprobes.txt.
+ */
+int register_uprobe(struct uprobe *u)
+{
+ struct task_struct *p;
+ struct uprobe_process *uproc;
+ struct uprobe_kimg *uk;
+ int survivors, ret = 0, uproc_is_new = 0;
+/* We should be able to access atleast a bkpt-size insn at u->vaddr */
+/* TODO: Verify that the vma containing u->vaddr is executable. */
+#define NBYTES_TO_TEST BP_INSN_SIZE
+ char buf[NBYTES_TO_TEST];
+
+ if (!u || !u->handler)
+ return -EINVAL;
+ if (u->uk && u->status == -EBUSY)
+ /* Looks like register or unregister is already in progress. */
+ return -EAGAIN;
+ u->uk = NULL;
+
+ rcu_read_lock();
+ p = find_task_by_pid(u->pid);
+ if (p)
+ get_task_struct(p);
+ rcu_read_unlock();
+
+ if (!p)
+ return -ESRCH;
+ u->tgid = p->tgid;
+
+ /* Exit early if vaddr is bad -- i.e., we can't even read from it. */
+ if (access_process_vm(p, u->vaddr, buf, NBYTES_TO_TEST, 0)
+ != NBYTES_TO_TEST) {
+ ret = -EINVAL;
+ goto fail_tsk;
+ }
+
+ /* Get the uprobe_process for this pid, or make a new one. */
+ mutex_lock(&uproc_mutex);
+ uproc = uprobe_find_process(p->tgid);
+
+ if (uproc) {
+ mutex_unlock(&uproc_mutex);
+ mutex_lock(&uproc->mutex);
+ } else {
+ uproc = uprobe_mk_process(p);
+ if (IS_ERR(uproc)) {
+ ret = (int) PTR_ERR(uproc);
+ mutex_unlock(&uproc_mutex);
+ goto fail_tsk;
+ }
+ /* Hold uproc_mutex until we've added uproc to uproc_table. */
+ uproc_is_new = 1;
+ }
+
+ INIT_LIST_HEAD(&u->list);
+
+ /* See if we already have a uprobe at the vaddr. */
+ down_write(&uproc->utable_rwsem);
+ uk = (uproc_is_new ? NULL : find_uprobe(uproc, u->vaddr, 2));
+ if (uk) {
+ /* uk is write-locked. */
+ /* Breakpoint is already in place, or soon will be. */
+ up_write(&uproc->utable_rwsem);
+ u->uk = uk;
+ list_add_tail(&u->list, &uk->uprobe_list);
+ switch (uk->state) {
+ case UPROBE_INSERTING:
+ u->status = -EBUSY; // in progress
+ break;
+ case UPROBE_REMOVING:
+ /* Wait! Don't remove that bkpt after all! */
+ uk->state = UPROBE_BP_SET;
+ list_del(&uk->pd_node); // Remove from pending list.
+ wake_up_all(&uk->waitq);// Wake unregister_uprobe().
+ /*FALLTHROUGH*/
+ case UPROBE_BP_SET:
+ u->status = 0;
+ break;
+ default:
+ BUG();
+ }
+ up_write(&uk->rwsem);
+ mutex_unlock(&uproc->mutex);
+ put_task_struct(p);
+ if (u->status == 0) {
+ uprobe_put_process(uproc);
+ return 0;
+ }
+ goto await_bkpt_insertion;
+ } else {
+ uk = uprobe_add_kimg(u, uproc);
+ up_write(&uproc->utable_rwsem);
+ if (IS_ERR(uk)) {
+ ret = (int) PTR_ERR(uk);
+ goto fail_uproc;
+ }
+ }
+
+ if (uproc_is_new) {
+ hlist_add_head(&uproc->hlist,
+ &uproc_table[hash_long(uproc->tgid, UPROBE_HASH_BITS)]);
+ mutex_unlock(&uproc_mutex);
+ }
+ put_task_struct(p);
+ survivors = quiesce_all_threads(uproc);
+ mutex_unlock(&uproc->mutex);
+
+ if (survivors == 0) {
+ purge_uprobe(u);
+ uprobe_put_process(uproc);
+ return -ESRCH;
+ }
+
+await_bkpt_insertion:
+ wait_event(uk->waitq, uk->state != UPROBE_INSERTING);
+ ret = u->status;
+ if (ret != 0)
+ purge_uprobe(u);
+ uprobe_put_process(uproc);
+ return ret;
+
+fail_uproc:
+ if (uproc_is_new) {
+ uprobe_free_process(uproc);
+ mutex_unlock(&uproc_mutex);
+ } else
+ uprobe_put_process(uproc);
+
+fail_tsk:
+ put_task_struct(p);
+ return ret;
+}
+
+void unregister_uprobe(struct uprobe *u)
+{
+ struct uprobe_process *uproc;
+ struct uprobe_kimg *uk;
+ int survivors;
+
+ if (!u)
+ return;
+
+ if (!u->uk)
+ /*
+ * This probe was never successfully registered, or
+ * has already been unregistered.
+ */
+ return;
+
+ if (u->status == -EBUSY)
+ /* Looks like register or unregister is already in progress. */
+ return;
+
+ /* As with unregister_kprobe, assume that u points to a valid probe. */
+ uk = u->uk;
+ uproc = uk->uproc;
+ uprobe_get_process(uproc);
+ mutex_lock(&uproc->mutex);
+ down_write(&uk->rwsem);
+
+ list_del(&u->list);
+ u->uk = NULL;
+ if (!list_empty(&uk->uprobe_list)) {
+ up_write(&uk->rwsem);
+ mutex_unlock(&uproc->mutex);
+ uprobe_put_process(uproc);
+ return;
+ }
+
+ /*
+ * The last uprobe at uk's probepoint is being unregistered.
+ * Queue the breakpoint for removal.
+ */
+ uk->state = UPROBE_REMOVING;
+ list_add_tail(&uk->pd_node, &uproc->pending_uprobes);
+ up_write(&uk->rwsem);
+
+ survivors = quiesce_all_threads(uproc);
+ mutex_unlock(&uproc->mutex);
+ if (survivors)
+ wait_event(uk->waitq, uk->state != UPROBE_REMOVING);
+
+ down_write(&uk->rwsem);
+ if (likely(uk->state == UPROBE_DISABLED))
+ uprobe_free_kimg(uk);
+ else
+ /* Somebody else's register_uprobe() resurrected uk. */
+ up_write(&uk->rwsem);
+ uprobe_put_process(uproc);
+}
+
+/*
+ * utrace engine report callbacks
+ */
+
+/*
+ * We've been asked to quiesce, but we hit a probepoint first. Now
+ * we're in the report_signal callback, having handled the probepoint.
+ * We'd like to just set the UTRACE_ACTION_QUIESCE and
+ * UTRACE_EVENT(QUIESCE) flags and coast into quiescence. Unfortunately,
+ * it's possible to hit a probepoint again before we quiesce. When
+ * processing the SIGTRAP, utrace would call uprobe_report_quiesce(),
+ * which must decline to take any action so as to avoid removing the
+ * uprobe just hit. As a result, we could keep hitting breakpoints
+ * and never quiescing.
+ *
+ * So here we do essentially what we'd prefer to do in uprobe_report_quiesce().
+ * If we're the last thread to quiesce, handle_pending_uprobes() and
+ * rouse_all_threads(). Otherwise, pretend we're quiescent and sleep until
+ * the last quiescent thread handles that stuff and then wakes us.
+ *
+ * Called and returns with no mutexes held. Returns 1 if we free utask->uproc,
+ * else 0.
+ */
+static int utask_quiesce_in_callback(struct uprobe_task *utask)
+{
+ struct uprobe_process *uproc = utask->uproc;
+ enum uprobe_task_state prev_state = utask->state;
+
+ mutex_lock(&uproc->mutex);
+ if (uproc->n_quiescent_threads == uproc->nthreads-1) {
+ /* We're the last thread to "quiesce." */
+ handle_pending_uprobes(uproc, utask->tsk);
+ rouse_all_threads(uproc);
+ mutex_unlock(&uproc->mutex);
+ return 0;
+ } else {
+ mutex_lock(&utask->mutex);
+ utask->state = UPTASK_SLEEPING;
+ mutex_unlock(&utask->mutex);
+ uproc->n_quiescent_threads++;
+ mutex_unlock(&uproc->mutex);
+ /* We ref-count sleepers. */
+ uprobe_get_process(uproc);
+
+ wait_event(uproc->waitq, !utask->quiescing);
+
+ mutex_lock(&uproc->mutex);
+ mutex_lock(&utask->mutex);
+ utask->state = prev_state;
+ mutex_unlock(&utask->mutex);
+ uproc->n_quiescent_threads--;
+ mutex_unlock(&uproc->mutex);
+
+ /*
+ * If uproc's last uprobe has been unregistered, and
+ * unregister_uprobe() woke up before we did, it's up
+ * to us to free uproc.
+ */
+ return uprobe_put_process(uproc);
+ }
+}
+
+/* Prepare to single-step uk's probed instruction inline. */
+static inline void uprobe_pre_ssin(struct uprobe_task *utask,
+ struct uprobe_kimg *uk, struct pt_regs *regs)
+{
+ int len;
+ arch_reset_ip_for_sstep(regs);
+ len = set_orig_insn(uk, utask->tsk);
+ if (unlikely(len != BP_INSN_SIZE)) {
+ printk("Failed to temporarily restore original "
+ "instruction for single-stepping: "
+ "pid/tgid=%d/%d, vaddr=%#lx\n",
+ utask->tsk->pid, utask->tsk->tgid, uk->vaddr);
+ // FIXME: Locking problems?
+ do_exit(SIGSEGV);
+ }
+}
+
+/* Prepare to continue execution after single-stepping inline. */
+static inline void uprobe_post_ssin(struct uprobe_task *utask,
+ struct uprobe_kimg *uk)
+{
+
+ int len = set_bp(uk, utask->tsk);
+ if (unlikely(len != BP_INSN_SIZE)) {
+ printk("Couldn't restore bp: pid/tgid=%d/%d, addr=%#lx\n",
+ utask->tsk->pid, utask->tsk->tgid, uk->vaddr);
+ uk->state = UPROBE_DISABLED;
+ }
+}
+
+/*
+ * Signal callback:
+ *
+ * We get called here with:
+ * state = UPTASK_RUNNING => we are here due to a breakpoint hit
+ * - Figure out which probepoint, based on regs->IP
+ * - Set state = UPTASK_BP_HIT
+ * - Reset regs->IP to beginning of the insn, if necessary
+ * - Invoke handler for each uprobe at this probepoint
+ * - Set state = UPTASK_SSTEP_AFTER_BP
+ * - Set singlestep in motion (UTRACE_ACTION_SINGLESTEP)
+ *
+ * state = UPTASK_SSTEP_AFTER_BP => here after singlestepping
+ * - Validate we are here per the state machine
+ * - Clean up after singlestepping
+ * - Set state = UPTASK_RUNNING
+ * - If it's time to quiesce, take appropriate action.
+ *
+ * state = ANY OTHER STATE
+ * - Not our signal, pass it on (UTRACE_ACTION_RESUME)
+ */
+static u32 uprobe_report_signal(struct utrace_attached_engine *engine,
+ struct task_struct *tsk, struct pt_regs *regs, u32 action,
+ siginfo_t *info, const struct k_sigaction *orig_ka,
+ struct k_sigaction *return_ka)
+{
+ struct uprobe_task *utask;
+ struct uprobe_kimg *uk;
+ struct uprobe_process *uproc;
+ struct uprobe *u;
+ u32 ret;
+ unsigned long probept;
+
+ utask = rcu_dereference((struct uprobe_task *)engine->data);
+ BUG_ON(!utask);
+
+ if (action != UTRACE_SIGNAL_CORE || info->si_signo != SIGTRAP)
+ goto no_interest;
+
+ mutex_lock(&utask->mutex);
+ switch (utask->state) {
+ case UPTASK_RUNNING:
+ uproc = utask->uproc;
+ probept = arch_get_probept(regs);
+ down_read(&uproc->utable_rwsem);
+ uk = find_uprobe(uproc, probept, 1);
+ up_read(&uproc->utable_rwsem);
+ if (!uk) {
+ mutex_unlock(&utask->mutex);
+ goto no_interest;
+ }
+ utask->active_probe = uk;
+ utask->state = UPTASK_BP_HIT;
+
+ if (likely(uk->state == UPROBE_BP_SET)) {
+ list_for_each_entry(u, &uk->uprobe_list, list) {
+ if (u->handler)
+ u->handler(u, regs);
+ }
+ }
+ up_read(&uk->rwsem);
+
+ utask->state = UPTASK_SSTEP_AFTER_BP;
+ mutex_unlock(&utask->mutex);
+ uprobe_pre_ssin(utask, uk, regs);
+ /*
+ * No other engines must see this signal, and the
+ * signal shouldn't be passed on either.
+ */
+ ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN |
+ UTRACE_ACTION_SINGLESTEP | UTRACE_ACTION_NEWSTATE;
+ break;
+ case UPTASK_SSTEP_AFTER_BP:
+ uk = utask->active_probe;
+ BUG_ON(!uk);
+ uprobe_post_ssin(utask, uk);
+
+ utask->active_probe = NULL;
+ ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
+ | UTRACE_ACTION_NEWSTATE;
+ utask->state = UPTASK_RUNNING;
+ if (utask->quiescing) {
+ mutex_unlock(&utask->mutex);
+ if (utask_quiesce_in_callback(utask) == 1)
+ ret |= UTRACE_ACTION_DETACH;
+ } else
+ mutex_unlock(&utask->mutex);
+
+ break;
+ default:
+ mutex_unlock(&utask->mutex);
+ goto no_interest;
+ }
+ return ret;
+
+no_interest:
+ return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * utask_quiesce_pending_sigtrap: The utask entered the quiesce callback
+ * through the signal delivery path, apparently. Check if the associated
+ * signal happened due to a uprobe hit.
+ *
+ * Called with utask->mutex held. Returns 1 if quiesce was entered with
+ * SIGTRAP pending due to a uprobe hit.
+ */
+static int utask_quiesce_pending_sigtrap(struct uprobe_task *utask)
+{
+ const struct utrace_regset_view *view;
+ const struct utrace_regset *regset;
+ struct uprobe_kimg *uk;
+ struct uprobe_process *uproc;
+ unsigned long inst_ptr;
+
+ if (utask->active_probe)
+ /* Signal must be the post-single-step trap. */
+ return 1;
+
+ view = utrace_native_view(utask->tsk);
+ regset = utrace_regset(utask->tsk, utask->engine, view, 0);
+ if (unlikely(regset == NULL))
+ return -EIO;
+
+ if ((*regset->get)(utask->tsk, regset, SLOT_IP * regset->size,
+ regset->size, &inst_ptr, NULL) != 0)
+ return -EIO;
+
+ uproc = utask->uproc;
+ down_read(&uproc->utable_rwsem);
+ uk = find_uprobe(uproc, ARCH_BP_INST_PTR(inst_ptr), 0);
+ up_read(&uproc->utable_rwsem);
+ return (uk != NULL);
+}
+
+/*
+ * Quiesce callback: The associated process has one or more breakpoint
+ * insertions or removals pending. If we're the last thread in this
+ * process to quiesce, do the insertion(s) and/or removal(s).
+ */
+static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
+ struct task_struct *tsk)
+{
+ struct uprobe_task *utask;
+ struct uprobe_process *uproc;
+
+ utask = rcu_dereference((struct uprobe_task *)engine->data);
+ BUG_ON(!utask);
+ if (current == utask->quiesce_master) {
+ /*
+ * tsk was already quiescent when quiesce_all_threads()
+ * called utrace_set_flags(), which in turned called
+ * here. uproc and utask are already locked. Do as
+ * little as possible and get out.
+ */
+ utask->state = UPTASK_QUIESCENT;
+ utask->uproc->n_quiescent_threads++;
+ return UTRACE_ACTION_RESUME;
+ }
+
+ mutex_lock(&utask->mutex);
+ if (!utask->quiescing) {
+ mutex_unlock(&utask->mutex);
+ goto done;
+ }
+
+ /*
+ * When a thread hits a breakpoint or single-steps, utrace calls
+ * this quiesce callback before our signal callback. We must
+ * let uprobe_report_signal() handle the uprobe hit and THEN
+ * quiesce, because (a) there's a chance that we're quiescing
+ * in order to remove that very uprobe, and (b) there's a tiny
+ * chance that even though that uprobe isn't marked for removal
+ * now, it may be before all threads manage to quiesce.
+ */
+ if (utask_quiesce_pending_sigtrap(utask) == 1) {
+ utask_adjust_flags(utask, CLEAR_ENGINE_FLAGS,
+ UTRACE_ACTION_QUIESCE | UTRACE_EVENT(QUIESCE));
+ mutex_unlock(&utask->mutex);
+ goto done;
+ }
+
+ utask->state = UPTASK_QUIESCENT;
+ mutex_unlock(&utask->mutex);
+
+ uproc = utask->uproc;
+ mutex_lock(&uproc->mutex);
+ uproc->n_quiescent_threads++;
+ check_uproc_quiesced(uproc, tsk);
+ mutex_unlock(&uproc->mutex);
+done:
+ return UTRACE_ACTION_RESUME;
+}
+
+/* Find a surviving thread in uproc. Runs with uproc->mutex held. */
+static struct task_struct *find_surviving_thread(struct uprobe_process *uproc)
+{
+ struct uprobe_task *utask;
+
+ list_for_each_entry(utask, &uproc->thread_list, list)
+ return utask->tsk;
+ return NULL;
+}
+
+/*
+ * uproc's process is exiting or exec-ing, so zap all the (now irrelevant)
+ * probepoints. Runs with uproc->mutex held. Caller must ref-count
+ * uproc before calling this function, to ensure that uproc doesn't get
+ * freed in the middle of this.
+ */
+void uprobe_cleanup_process(struct uprobe_process *uproc)
+{
+ int i;
+ struct uprobe_kimg *uk;
+ struct hlist_node *node, *t1;
+ struct hlist_head *head;
+ struct uprobe *u, *t2;
+
+ for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+ head = &uproc->uprobe_table[i];
+ hlist_for_each_entry_safe(uk, node, t1, head, ut_node) {
+ down_write(&uk->rwsem);
+ if (uk->state == UPROBE_INSERTING ||
+ uk->state == UPROBE_REMOVING) {
+ /*
+ * This task is (exec/exit)ing with
+ * a [un]register_uprobe pending.
+ * [un]register_uprobe will free uk.
+ */
+ uk->state = UPROBE_DISABLED;
+ list_for_each_entry_safe(u, t2,
+ &uk->uprobe_list, list)
+ u->status = -ESRCH;
+ up_write(&uk->rwsem);
+ wake_up_all(&uk->waitq);
+ } else if (uk->state == UPROBE_BP_SET) {
+ list_for_each_entry_safe(u, t2,
+ &uk->uprobe_list, list) {
+ u->status = -ESRCH;
+ u->uk = NULL;
+ list_del(&u->list);
+ }
+ uprobe_free_kimg_locked(uk);
+ uprobe_put_process(uproc);
+ } else {
+ /*
+ * If uk is UPROBE_DISABLED, assume that
+ * [un]register_uprobe() has been notified
+ * and will free it soon.
+ */
+ up_write(&uk->rwsem);
+ }
+ }
+ }
+}
+
+/*
+ * Exit callback: The associated task/thread is exiting.
+ */
+static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
+ struct task_struct *tsk, long orig_code, long *code)
+{
+ struct uprobe_task *utask;
+ struct uprobe_process *uproc;
+ struct uprobe_kimg *uk;
+ int utask_quiescing;
+
+ utask = rcu_dereference((struct uprobe_task *)engine->data);
+
+ uk = utask->active_probe;
+ if (uk) {
+ printk(KERN_WARNING "Task died at uprobe probepoint:"
+ " pid/tgid = %d/%d, probepoint = %#lx\n",
+ tsk->pid, tsk->tgid, uk->vaddr);
+ if (utask->state == UPTASK_BP_HIT)
+ /* Running handler */
+ up_read(&uk->rwsem);
+ mutex_unlock(&utask->mutex);
+ }
+
+ uproc = utask->uproc;
+ mutex_lock(&uproc->mutex);
+ utask_quiescing = utask->quiescing;
+
+ list_del(&utask->list);
+ kfree(utask);
+
+ uproc->nthreads--;
+ if (uproc->nthreads) {
+ if (utask_quiescing)
+ /*
+ * In case other threads are waiting for
+ * us to quiesce...
+ */
+ check_uproc_quiesced(uproc,
+ find_surviving_thread(uproc));
+ mutex_unlock(&uproc->mutex);
+ } else {
+ /*
+ * We were the last remaining thread - clean up the uprobe
+ * remnants a la unregister_uprobe(). We don't have to
+ * remove the breakpoints, though.
+ */
+ uprobe_get_process(uproc);
+ uprobe_cleanup_process(uproc);
+ mutex_unlock(&uproc->mutex);
+ uprobe_put_process(uproc);
+ }
+
+ return UTRACE_ACTION_DETACH;
+}
+
+/*
+ * Clone callback: The current task has spawned a thread/process.
+ *
+ * NOTE: For now, we don't pass on uprobes from the parent to the
+ * child. We now do the necessary clearing of breakpoints in the
+ * child's address space.
+ *
+ * TODO:
+ * - Provide option for child to inherit uprobes.
+ */
+static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
+ struct task_struct *parent, unsigned long clone_flags,
+ struct task_struct *child)
+{
+ int len;
+ struct uprobe_process *uproc;
+ struct uprobe_task *ptask, *ctask;
+
+ ptask = rcu_dereference((struct uprobe_task *)engine->data);
+ uproc = ptask->uproc;
+
+ /*
+ * Lock uproc so no new uprobes can be installed till all
+ * report_clone activities are completed
+ */
+ mutex_lock(&uproc->mutex);
+ get_task_struct(child);
+
+ if (clone_flags & CLONE_THREAD) {
+ /* New thread in the same process */
+ ctask = uprobe_add_task(child, uproc);
+ BUG_ON(!ctask);
+ if (IS_ERR(ctask)) {
+ put_task_struct(child);
+ mutex_unlock(&uproc->mutex);
+ goto fail;
+ }
+ if (ctask)
+ uproc->nthreads++;
+ /*
+ * FIXME: Handle the case where uproc is quiescing
+ * (assuming it's possible to clone while quiescing).
+ */
+ } else {
+ /*
+ * New process spawned by parent. Remove the probepoints
+ * in the child's text.
+ *
+ * Its not necessary to quiesce the child as we are assured
+ * by utrace that this callback happens *before* the child
+ * gets to run userspace.
+ *
+ * We also hold the uproc->mutex for the parent - so no
+ * new uprobes will be registered 'til we return.
+ */
+ int i;
+ struct uprobe_kimg *uk;
+ struct hlist_node *node;
+ struct hlist_head *head;
+
+ for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+ head = &uproc->uprobe_table[i];
+ hlist_for_each_entry(uk, node, head, ut_node) {
+ down_write(&uk->rwsem);
+ len = set_orig_insn(uk, child);
+ if (len != BP_INSN_SIZE) {
+ /* Ratelimit this? */
+ printk(KERN_ERR "Pid %d forked %d;"
+ " failed to remove probepoint"
+ " at %#lx in child\n",
+ parent->pid, child->pid,
+ uk->vaddr);
+ }
+ up_write(&uk->rwsem);
+ }
+ }
+ }
+
+ put_task_struct(child);
+ mutex_unlock(&uproc->mutex);
+
+fail:
+ return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * Exec callback: The associated process called execve() or friends
+ *
+ * The new program is about to start running and so there is no
+ * possibility of a uprobe from the previous user address space
+ * to be hit.
+ *
+ * NOTE:
+ * Typically, this process would have passed through the clone
+ * callback, where the necessary action *should* have been
+ * taken. However, if we still end up at this callback:
+ * - We don't have to clear the uprobes - memory image
+ * will be overlaid.
+ * - We have to free up uprobe resources associated with
+ * this process.
+ */
+static u32 uprobe_report_exec(struct utrace_attached_engine *engine,
+ struct task_struct *tsk, const struct linux_binprm *bprm,
+ struct pt_regs *regs)
+{
+ struct uprobe_process *uproc;
+ struct uprobe_task *utask;
+ int uproc_freed;
+
+ utask = rcu_dereference((struct uprobe_task *)engine->data);
+ uproc = utask->uproc;
+ uprobe_get_process(uproc);
+
+ mutex_lock(&uproc->mutex);
+ uprobe_cleanup_process(uproc);
+ mutex_unlock(&uproc->mutex);
+
+ /* If any [un]register_uprobe is pending, it'll clean up. */
+ uproc_freed = uprobe_put_process(uproc);
+ return (uproc_freed ? UTRACE_ACTION_DETACH : UTRACE_ACTION_RESUME);
+}
+
+static const struct utrace_engine_ops uprobe_utrace_ops =
+{
+ .report_quiesce = uprobe_report_quiesce,
+ .report_signal = uprobe_report_signal,
+ .report_exit = uprobe_report_exit,
+ .report_clone = uprobe_report_clone,
+ .report_exec = uprobe_report_exec
+};
+
+#define arch_init_uprobes() 0
+
+static int __init init_uprobes(void)
+{
+ int i, err = 0;
+
+ for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+ INIT_HLIST_HEAD(&uproc_table[i]);
+
+ p_uprobe_utrace_ops = &uprobe_utrace_ops;
+ err = arch_init_uprobes();
+ return err;
+}
+__initcall(init_uprobes);
+
+EXPORT_SYMBOL_GPL(register_uprobe);
+EXPORT_SYMBOL_GPL(unregister_uprobe);
_