This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Making the transport layer more robust


On 08/12/2011 10:43 AM, Mark Wielaard wrote:
> commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Fri Aug 12 19:34:20 2011 +0200
> 
>     Remove _stp_ctl_work_timer from module transport layer.
>     
>     The _stp_ctl_work_timer would trigger every 20ms to check whether
>     there were cmd messages queued, but not announced yet and to
>     check the _stp_exit_flag was set.
>     
>     This commit makes all control messages announce themselves and
>     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
>     is still possibly delayed since the messages are just pushed on
>     a wait queue).

This has unfortunately left open an opportunity for deadlock.  The
kernel wake_up infrastructure takes a spinlock on the wait queue.  If
the probe context happens to fire while that lock is held, either via a
direct probe on something called by wake_up or indirectly via NMI, then
the handler must not call anything that would attempt the same lock.
But this commit is triggering a wake_up on ctl prints, and commit
a85c8aff triggers the same on exit().

For example, __wake_up_common is called with a lock held, and then
either of these will cause a deadlock:

  probe kernel.function("__wake_up_common") { warn(pp()) }

  probe kernel.function("__wake_up_common") { exit() }

This issue in general is very similar to PR2525.  We must take care not
to call any blocking code from arbitrary probe context.

Thanks,
Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]