This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: testsuite and hardcoded timeouts


Quentin Barnes wrote:
Quentin Barnes <qbarnes@urbana.css.mot.com> writes:

[...]
Ah, maybe there is some middle ground here.  Instead of putting the
effort into figuring out some portable method for dynamic timeouts,
just change the behavior for a timeout to be user-settable [...]

It can be even easier than that. dejagnu's "timeout" tcl variable is exactly the default timeout duration in seconds.

The "timeout" variable is an expect feature. It is already set in stap_run.exp and stap_run2.exp, but timeouts are also manually specified in various expect statements sprinkled through the testsuite. Those are the ones that cause me the most headaches. Otherwise, tinkering with just two files would be trivial.

The .exp files under
testsuite/config or even testsuite/lib could set this global variable
based on the "ishost" predicate - leave it for i686, double it for
s390x, dedicule (!) it for arm.  Then we just need to police the test
cases to avoid messing with this value.

It's not that simple. For example, my setup is really, really slow because it is using NFS mounted root and swap with a small amount of RAM. Another ARM system could run easily 5x-10x faster than mine with just more memory or a real hard disk.

Rather than create an "ishost" rule, I suggested that what would
probably be better is to use the MHz or BogoMIPS number from
/proc/cpuinfo.  But even that's a heuristic because it only takes
in account the CPU speed, not the system speed that can be choked
due to I/O limitations.

Seems like it would make more sense to have a environment variable that the test timeouts are computed off of. Make all the tests use that value. It should be fairly simple to grep for the explicit timeout changes and fix those.


What I'd like to know is if it is really necessary to have fatal
timeouts.  How often does running the test suite truly hang up
where the timeout feature gets it unstuck?

I've found that if my system has taken too long, it's due to a bug
and the kernel is no longer stable.  However, I don't work on the
stap translator.  I suspect bugs in it are what causes recoverable
test hang ups to exist.

It is possible that a probe doesn't fire and cause a systemtap script to exit. In that case things could be hung. Really do need to have the explicit timeouts to move on. Want to get coverage on the tests. Better to give up on a test taking way, way too long, FAIL it, and get the rest of the test run than it is to get hung up on that problem test.


-Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]