This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

task_finder2 help

From: David Smith <dsmith at redhat dot com>
To: "systemtap at sourceware dot org" <systemtap at sourceware dot org>
Date: Tue, 23 Aug 2011 11:23:59 -0500
Subject: task_finder2 help

I've been working on replacing the task_finder's use of utrace.  You can
see this work on the dsmith/task_finder2 branch.  How does this work?
I'm using 4 tracepoints and one ftrace function hook to monitor task
fork/exit/exec and syscall entry & syscall exit.

On F15 (kernel 2.6.40-4.fc15.x86_64.debug) what's checked in works
great, all utrace tests pass.  Note that the code isn't finished, there
is a known memory leak for example, but it seems to work.

However, on rawhide (3.1.0-0.rc1.git6.1.fc17.x86_64), the kernel
complains mightily about sleeping functions called from invalid contexts.

There are a couple of classes of these bugs (after working through some
semi-minor issues):

- Use of get_task_mm()/mmput().  There are several places in
runtime/task_finder2.c where we call get_task_mm() to get a task's mm,
we do something with it (like get the task's pathname), then call
mmput() to release the mm.  (A couple of places we just check to see if
the task has a mm, in that case calling get_task_mm()/mmput() is
probably overkill.)

Looking at the source to get_task_mm() (in kernel/fork.c), you'll see
that it locks the task, increments an atomic variable, then unlocks the
task.  The problem is in mmput().  mmput() calls might_sleep()
unconditionally.  Why?  Because when the atomic reaches 0, all the mm's
data is freed.

So, how to work around this?  For now I've been just locking the task,
hoping that nothing will change the mm.  If the task is a single
threaded task, that assumption might be reasonable.  The task is stopped
anyway at the various points I probe (fork/exec/exit/syscalls).
However, if the task is part of a multi-threaded program, that
assumption probably isn't true.  Yes, this task is stopped, but another
running task in the same group could possibly change the mm.

Anyone got any ideas here?

- Memory allocation.  All memory allocation in the task_finder is done
using GFP_KERNEL (with perhaps some additional other flags).  Most of
the allocation is done to store PATH_MAX buffers, to be used when we're
looking at the pathname of a task.  Even as far back as the 2.6.9
kernel, GFP_KERNEL has been defined as (__GFP_WAIT | __GFP_IO |
__GFP_FS).  If __GFP_WAIT is set in the flags passed to kmalloc(),
kmalloc() calls might_sleep().

So, the obvious solution is to change the flags we pass from GFP_KERNEL
to (GFP_KERNEL & ~__GFP_WAIT).  That seems to avoid the BUG, but I'm
unsure of the ramifications (other than the obvious one that more memory
allocations could fail).  It could be that we should have been turning
off __GFP_WAIT even before now.

Does anyone know of pitfalls of turning off __GFP_WAIT?

Thanks for the help.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

Follow-Ups:
- Re: task_finder2 help
  - From: Josh Stone

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]