This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Patch] Fix bug of deadlock for staprun
- From: Lai Jiangshan <laijs at cn dot fujitsu dot com>
- To: systemtap at sourceware dot org
- Date: Mon, 15 Oct 2007 14:47:36 +0900
- Subject: [Patch] Fix bug of deadlock for staprun
hi, all
This patch fixed two serious bugs. These two bugs
will cause staprun deadlock:
staprun holds some files belong to stap_XXXXXX.ko
and staprun deletes this module before closing these files.
Thus syscall delete_module will wait staprun to close these
files, but staprun also must wait for the return of syscall
delete_module before doing other things(close files). It
will cause deadlock, and root can't cleanup this deadlock,
because staprun is in "Uninterruptible sleep".
First bug: handle_symbols() opened control_channel
and exited without closing this fd when errors occur. So if
any error occurred in handle_symbols(), it will result in
deadlock.
The most easy way to reproduce this bug:
$ ulimit -HSn 4
# make staprun open control_channel successfully,
# but fail to open one more.
$ staprun -L stap_XXXXXX.ko
# deadlock
Just add a line "close_ctl_channel();" in cleanup()
can fix this bug.
Second bug: staprun runs as privileged process, but
it trusts an unprivileged process stapio's return value. And
the owner of files belong to stap_XXXXXX.ko is the user who
runs staprun.
So evil_usr(he is just a stapuser) can cause a
deadlock in this way:
step command
1 $ staprun -L stap_XXXXXX.ko
2 $ exec 3<>/sys/kernel/debug/systemtap/stap_XXXXXX/cmd
# open one of the files belong to stap_XXXXXX.ko,
# fd 3 will be inherited by staprun, so staprun
# holds this file.
3 $ staprun -A stap_XXXXXX.ko &
4 # (very quickly) evil_usr uses syscall ptrace to modify
# argument of syscall exit in stapio, make stapio exit
# with exit_code=3.
Thus staprun will get stapio's exit_code=3, so it will
delete stap_XXXXXX.ko which lead to a deadlock.
In this condition, staprun gets too little information
to prevent evil_usr from doing such things. And staprun has no
enough information to return back all resource belong to
stap_XXXXXX.ko before deleting it. The best way for staprun is
that staprun cannot "Uninterruptible sleep", just left root
cleanup the mess.
So I just added a flag O_NONBLOCK when calling
delete_module. I think this flag should be added even if the
bug does not exist.
The bugs can be fixed by the following patch.(patch
for systemtap-snapshot-20071006, not this week)
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
diff -Nur src/runtime/staprun/staprun.c src.new/runtime/staprun/staprun.c
--- src/runtime/staprun/staprun.c 2007-08-16 02:37:21.000000000 +0900
+++ src.new/runtime/staprun/staprun.c 2007-10-15 14:07:27.000000000 +0900
@@ -123,6 +123,8 @@
if (setpriority (PRIO_PROCESS, 0, 0) < 0)
_perr("setpriority");
+
+ close_ctl_channel();
/* rc == 2 means disconnected */
if (rc == 2)
@@ -133,7 +135,7 @@
if (inserted_module || rc == 3) {
long ret;
dbug(2, "removing module %s\n", modname);
- ret = do_cap(CAP_SYS_MODULE, delete_module, modname, 0);
+ ret = do_cap(CAP_SYS_MODULE, delete_module, modname, O_NONBLOCK);
if (ret != 0)
err("Error removing module '%s': %s\n", modname, moderror(errno));
}