This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Making the transport layer more robust
On Tue, 2011-07-19 at 17:02 +0200, Mark Wielaard wrote:
> On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote:
> > pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits
> > there waiting the reap staprun, which will never happen since it tries
> > to pkill it at the same time, that could be because the startup/exit of
> > staprun/stapio is much more robust now, but I don't fully understand the
> > expect spawn, catch, wait logic. Maybe it is some strange bug in the
> > rhel5 expect? Maybe I changed some expectation of staprun/stapio/module
> > interaction? Any help understanding the expect logic would be
> > appreciated.
>
> I think I narrowed this down to the following commit:
>
> commit 5c854d7ca64df766c581c9ed7ff81e04c9d1fa4d
> Author: Chris Meek <cmeek@redhat.com>
> Date: Wed Jul 13 10:31:47 2011 -0400
>
> PR12890: Renaming modules in Staprun
>
> Although it is somewhat hard to say, because it doesn't always fail. But
> I have never seen it fail before this commit.
>
> Still trying to understand the real issue and the testcase though. So
> all help appreciated.
Frank seems to have fixed it by changing the testcase as follows:
commit 49909b5572bc61c03cc80ef94f6d00dc5bbf665d
Author: Frank Ch. Eigler <fche@redhat.com>
Date: Tue Jul 19 13:52:58 2011 -0400
resolve PR12890 vs PR10854 bunfight
The PR10854 test case uses a tight loop of staprun and a nexted loop
of pkills, written in a way that counts on staprun's pre-PR12890
"insert; unload; retry insert" module-handling heuristic. With this
heuristic gone (and error messages properly generated), the PR10854
test case goes woozy and hangs in the while { ... pkill ... } tcl
loop. Now we don't loop in there any more.
The test now passes on all my setups.
Cheers,
Mark