This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug runtime/15982] process.end probes broken on RHEL6
- From: "jistone at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sourceware dot org
- Date: Thu, 26 Sep 2013 22:36:34 +0000
- Subject: [Bug runtime/15982] process.end probes broken on RHEL6
- Auto-submitted: auto-generated
- References: <bug-15982-6586 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=15982
Josh Stone <jistone at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jistone at redhat dot com
--- Comment #2 from Josh Stone <jistone at redhat dot com> ---
(In reply to David Smith from comment #1)
> 1) Let the process.end probe fire even when the the module's session state
> isn't STAP_SESSION_RUNNING. This works a good bit of the time, but not 100%
> consistently.
I'm guessing this still has a race, to reach the process.end before the module
has executed all the end probes and reached cleanup / unload.
Even if that doesn't race, it breaks the general idea that end probes run in
exclusion, after everything else has finished. For example, a final report of
probe activity is not so final anymore if a process.end might change data.
> 2) Switch the task_finder from using UTRACE_DEATH (Thread has died) to
> UTRACE_EXIT (Thread exit in progress). The UTRACE_EXIT event happens before
> the signal is sent to the dying thread's parent, so we won't miss the event.
This sound fine as long as these really are paired, meaning that UTRACE_EXIT
always precedes and is always followed by UTRACE_DEATH. That does appear to be
true, as far as I can follow the tracehook_reports in kernel/exit.c.
> In the tracepoint-based utrace replacement (for kernels without built-in
> utrace), the 'sched_process_exit' tracepoint (which we use for process.end
> probes) happens in a similar place as the UTRACE_EXIT hook, so this should
> work reasonably well.
The position of the tracepoint is one thing, and it's also important when our
"quiesce" task work will get caught up, right? But this appears to be shortly
after, via exit_task_work(), still before exit_notify() is called. OK. :)
So (2) seems clearly preferable to me.
--
You are receiving this mail because:
You are the assignee for the bug.