This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Controlling probe overhead
David Smith <dsmith@redhat.com> writes:
> [...] BTW, I had to rework the STP_TIMING code a very small bit to make it
> work correctly with the STP_OVERLOAD code. The STP_TIMING code was
> storing cycle counts as 32-bit values, where the STP_OVERLOAD code
> wanted 64-bit cycle counts. The STP_TIMING code now truncates down to
> 32-bits a little later than it did originally.
Note that the current code doesn't (intend to) truncate cycle counts,
just individual samples of the get_cycles() values.
> [...] I've have one stress test (that Frank wrote) that will make a
> RHEL5 system non-responsive. The system doesn't crash - just
> decides to no longer take any input. The overload code kills the
> script in less than 3 minutes.
3 minutes is almost certainly too long for a default overload
detection interval. I would expect something on the order of a few
seconds.
> Note that I haven't implemented the new error probes you and Frank
> discussed. I'd like to get the current code in (since it is quite
> useful in its current state) before thinking about error probes.
Indeed, they are independent ideas.
> [...]
> + << " -O turn off automatic probe overload handling" << endl
IMO, there is no need for this option. Overload detection should
always be present, and tunable with the (documented?) -D parameters.
If this code depends on the STP_TIMING stuff in the probe
prologues/epilogues, than most of that code too could be on also,
(with -t just controlling whether the final timing report is printed).
> - o->newline(1) << "int32_t cycles_atend = (int32_t) get_cycles ();";
> - // Handle 32-bit wraparound.
> [...]
Perhaps you could excerpt the actual generated overload/timing code
here. It looks like there may be more being done here than necessary.
> + o->newline() << "#ifndef STP_OVERLOAD_INTERVAL";
> + o->newline() << "#define STP_OVERLOAD_INTERVAL 1000000000LL";
> + o->newline() << "#endif";
> + o->newline() << "#ifndef STP_OVERLOAD_THRESHOLD";
> + o->newline() << "#define STP_OVERLOAD_THRESHOLD 500000000LL";
> + o->newline() << "#endif";
These quantities should probably depend on the processor, so that
overload intervals are measured in units of time rather than cycles.
- FChE