This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: testsuite and hardcoded timeouts

From: William Cohen <wcohen at redhat dot com>
To: Quentin Barnes <qbarnes at urbana dot css dot mot dot com>
Cc: "Frank Ch. Eigler" <fche at redhat dot com>, David Wilder <dwilder at us dot ibm dot com>, systemtap at sources dot redhat dot com
Date: Wed, 16 May 2007 15:02:59 -0400
Subject: Re: testsuite and hardcoded timeouts
References: <20070511191420.GA12285@urbana.css.mot.com> <4644DB46.1070705@redhat.com> <46489CF5.6010705@us.ibm.com> <4648C9B1.30307@redhat.com> <4648D7C2.2050007@us.ibm.com> <20070515223546.GD25729@urbana.css.mot.com> <y0modklhfg9.fsf@ton.toronto.redhat.com> <20070516004247.GF25729@urbana.css.mot.com>

Quentin Barnes wrote:

Quentin Barnes <qbarnes@urbana.css.mot.com> writes:

[...]
Ah, maybe there is some middle ground here.  Instead of putting the
effort into figuring out some portable method for dynamic timeouts,
just change the behavior for a timeout to be user-settable [...]


It can be even easier than that.  dejagnu's "timeout" tcl variable is
exactly the default timeout duration in seconds.


The "timeout" variable is an expect feature.  It is already set
in stap_run.exp and stap_run2.exp, but timeouts are also manually
specified in various expect statements sprinkled through the
testsuite.  Those are the ones that cause me the most headaches.
Otherwise, tinkering with just two files would be trivial.

The .exp files under
testsuite/config or even testsuite/lib could set this global variable
based on the "ishost" predicate - leave it for i686, double it for
s390x, dedicule (!) it for arm.  Then we just need to police the test
cases to avoid messing with this value.


It's not that simple.  For example, my setup is really, really slow
because it is using NFS mounted root and swap with a small amount of
RAM.  Another ARM system could run easily 5x-10x faster than mine
with just more memory or a real hard disk.

Rather than create an "ishost" rule, I suggested that what would
probably be better is to use the MHz or BogoMIPS number from
/proc/cpuinfo.  But even that's a heuristic because it only takes
in account the CPU speed, not the system speed that can be choked
due to I/O limitations.

Seems like it would make more sense to have a environment variable that the test timeouts are computed off of. Make all the tests use that value. It should be fairly simple to grep for the explicit timeout changes and fix those.

What I'd like to know is if it is really necessary to have fatal
timeouts.  How often does running the test suite truly hang up
where the timeout feature gets it unstuck?

I've found that if my system has taken too long, it's due to a bug
and the kernel is no longer stable.  However, I don't work on the
stap translator.  I suspect bugs in it are what causes recoverable
test hang ups to exist.

It is possible that a probe doesn't fire and cause a systemtap script to exit. In that case things could be hung. Really do need to have the explicit timeouts to move on. Want to get coverage on the tests. Better to give up on a test taking way, way too long, FAIL it, and get the rest of the test run than it is to get hung up on that problem test.

-Will

References:
- testsuite and hardcoded timeouts
  - From: Quentin Barnes
- Re: testsuite and hardcoded timeouts
  - From: William Cohen
- Re: testsuite and hardcoded timeouts
  - From: David Wilder
- Re: testsuite and hardcoded timeouts
  - From: William Cohen
- Re: testsuite and hardcoded timeouts
  - From: David Wilder
- Re: testsuite and hardcoded timeouts
  - From: Quentin Barnes
- Re: testsuite and hardcoded timeouts
  - From: Frank Ch. Eigler
- Re: testsuite and hardcoded timeouts
  - From: Quentin Barnes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]