This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Patch] Fix bug of deadlock for staprun


hi, all

	This patch fixed two serious bugs. These two bugs
will cause staprun deadlock:
	staprun holds some files belong to stap_XXXXXX.ko
and staprun deletes this module before closing these files.
Thus syscall delete_module will wait staprun to close these
files, but staprun also must wait for the return of syscall
delete_module before doing other things(close files). It
will cause deadlock, and root can't cleanup this deadlock,
because staprun is in "Uninterruptible sleep".

	First bug: handle_symbols() opened control_channel
and exited without closing this fd when errors occur. So if
any error occurred in handle_symbols(), it will result in
deadlock.
	The most easy way to reproduce this bug:
$ ulimit -HSn 4
	# make staprun open control_channel successfully,
	# but fail to open one more.
$ staprun -L stap_XXXXXX.ko
	# deadlock
	Just add a line "close_ctl_channel();" in cleanup()
can fix this bug.

	Second bug: staprun runs as privileged process, but
it trusts an unprivileged process stapio's return value. And
the owner of files belong to stap_XXXXXX.ko is the user who
runs staprun.

	So evil_usr(he is just a stapuser) can cause a
deadlock in this way:
step command
1    $ staprun -L stap_XXXXXX.ko
2    $ exec 3<>/sys/kernel/debug/systemtap/stap_XXXXXX/cmd
	# open one of the files belong to stap_XXXXXX.ko,
	# fd 3 will be inherited by staprun, so staprun
	# holds this file.
3    $ staprun -A stap_XXXXXX.ko &
4    # (very quickly) evil_usr uses syscall ptrace to modify
    # argument of syscall exit in stapio, make stapio exit
    # with exit_code=3.

	Thus staprun will get stapio's exit_code=3, so it will
delete stap_XXXXXX.ko which lead to a deadlock.

	In this condition, staprun gets too little information
to prevent evil_usr from doing such things. And staprun has no
enough information to return back all resource belong to
stap_XXXXXX.ko before deleting it. The best way for staprun is
that staprun cannot "Uninterruptible sleep", just left root
cleanup the mess.

	So I just added a flag O_NONBLOCK when calling
delete_module. I think this flag should be added even if the
bug does not exist.

	The bugs can be fixed by the following patch.(patch
for systemtap-snapshot-20071006, not this week)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>

diff -Nur src/runtime/staprun/staprun.c src.new/runtime/staprun/staprun.c
--- src/runtime/staprun/staprun.c	2007-08-16 02:37:21.000000000 +0900
+++ src.new/runtime/staprun/staprun.c	2007-10-15 14:07:27.000000000 +0900
@@ -123,6 +123,8 @@

	if (setpriority (PRIO_PROCESS, 0, 0) < 0)
		_perr("setpriority");
+
+	close_ctl_channel();
	
	/* rc == 2 means disconnected */
	if (rc == 2)
@@ -133,7 +135,7 @@
	if (inserted_module || rc == 3) {
		long ret;
		dbug(2, "removing module %s\n", modname);
-		ret = do_cap(CAP_SYS_MODULE, delete_module, modname, 0);
+		ret = do_cap(CAP_SYS_MODULE, delete_module, modname, O_NONBLOCK);
		if (ret != 0)
			err("Error removing module '%s': %s\n", modname, moderror(errno));
	}




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]