This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Updating from 0303 to 0505 has problems
On Thu, May 10, 2007 at 01:28:44AM -0400, Martin Hunt wrote:
On Wed, 2007-05-09 at 22:22 -0500, Quentin Barnes wrote:
The problem is the "poll warning: Interrupted system call"
diagnostic message from staprun that's throwing off expect. Below
is a patch that fixes this problem for this message. Now whether
this is a fix or a workaround I don't know. There are two possible
fixes: 1) change stap_run.exp to account for the extra diagnostic,
2) change staprun's reader_thread() not to generate the diagnostic
in the first place. I chose the former for this patch. Which
approach would be the better fix?
I think I checked in a fix to this a few days ago. Certainly that error
message no longer is in the sources. Of course, there could be new
problems now... Could you give either the current CVS version a try or a
snapshot from May 8 or later?
I tried it with the 0512 snapshot and it still has problems.
Running it with add.stp:
====================
# staprun .systemtap/cache/51/stap_51971f530865fabd17b2f297433cc8f6_473.ko
systemtap starting probe
[pressed ^C here]
#
====================
After I pressed ^C there should have been the additional messages:
====================
systemtap ending probe
systemtap test success
====================
But there was nothing. It just exited. This is now also causing
a lot of "eof" failures with the test suite since those messages
are lost.
Rerunning now with -vvv:
====================
# staprun -vvv .systemtap/cache/51/stap_51971f530865fabd17b2f297433cc8f6_473.ko
main:141 modpath=".systemtap/cache/51/stap_51971f530865fabd17b2f297433cc8f6_473.ko", modname="stap_51971f530865fabd17b2f297433cc8f6_473"
run_stp_check:191 executing /usr/local/libexec/systemtap/stp_check
init_ctl_channel:59 Opening /sys/kernel/debug/systemtap/stap_51971f530865fabd17b2f297433cc8f6_473/cmd
stp_main_loop:335 in main loop
stp_main_loop:423 STP_SYMBOLS request received
stp_main_loop:416 STP_MODULES request received
init_relayfs:104 initializing relayfs
init_relayfs:120 attempting to open /sys/kernel/debug/systemtap/stap_51971f530865fabd17b2f297433cc8f6_473/trace0
init_relayfs:120 attempting to open /sys/kernel/debug/systemtap/stap_51971f530865fabd17b2f297433cc8f6_473/trace1
init_relayfs:126 ncpus=1
init_relayfs:164 starting threads
stp_main_loop:382 probe_start() returned 0
systemtap starting probe
[pressed ^C here]
sigproc:33 sigproc 2 (Interrupt)
reader_thread:74 poll=-1 errno=4
# stp_main_loop:375 got STP_EXIT, closed=0
cleanup_and_exit:288 CLEANUP AND EXIT closed=0
close_relayfs:183 closing
close_relayfs:190 sent SIGUSR2
close_relayfs:204 done
cleanup_and_exit:301 closing control channel
cleanup_and_exit:305 removing module
sigproc:33 sigproc 17 (Child exited)
====================
I think what is going on is that the ^C forces the child in
reader_thread() to asynchronously send the STP_EXIT command making
for an early abort prior to it getting down to its read/write loop
to send the rest of the final messages first.
My guess at what you need to do is not have sigproc() actually make
the send_request(STP_EXIT, NULL, 0) call itself, but to just set
the "stop_threads" state to 1. Only after letting reader_thread()
finish up its inner while and its outer do/while loops, have it only
then send the STP_EXIT directive.
I don't know the code well enough to know how complex this is to do
this tweak with its current multi-threaded design.
Martin
Quentin