This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: MAXACTION exceeded error while using systemtap


On 12/8/05, Martin Hunt <hunt@redhat.com> wrote:
> On Wed, 2005-12-07 at 18:06 -0500, Frank Ch. Eigler wrote:
> > [...]
> > Or, in other words, for ensuring one type of safety, I believe runtime
> > means are sufficient with unrestricted language; you believe
> > probe-point-specific language restrictions are necessary and/or
> > sufficient (which?).
>
> Necessary, but not sufficient.
>
> I understand your desire for internal elegance.  However from a
> systemtap users perspective, what you propose is terrible.  You are
> proposing having functions that depending on where they are used may
> work some of the time, until an internal threshold (which depends on
> surrounding functions, the number of cpus, and number of elements in an
> array) is hit. And then your program terminates with an error!
>
> I believe having something like MAXACTION is necessary as a check
> against putting too much in a kprobe or infinite looping. However it
> should trigger immediately and should not be based on dynamically
> changing thresholds. And should not be user-visible at all.
>

perhaps you two, should look at how your neighbor DTrace deals with
this issue. It seems to work pretty well. They recently talked about
this on there  dtrace-discuss mailing list

http://www.opensolaris.org/jive/thread.jspa?messageID=15073&#15073

btw MAXACTION really can't work reliably what happens when the box is
underload, say with 4 gigabit nics all being flooded with data, and
you are probing a function in the fast path?

Not everyone has a box that can sum 4 billion numbers in less than a minute.

and many other reasons.

James Dickens
uadmin.blogspot.com

P.S. I would recomend the members of this list subscribe to the
dtrace-discuss list on www.opensolaris.org, its a low traffic list,
but they do discuss issues you will face as you proceed. I can assure
that the dtrace programers monitor your mailing list.


> Systemtap is not a general-purpose programming environment. kprobes are
> very timing sensitive. They should do data collection and printing of
> simple scalar data. Data analysis can/should be done in other contexts.
> I see nothing wrong with documenting this and enforcing it.
>
> > Can you explain why you believe that an operation that takes the exact
> > same amount of time (dumping the data) is necessarily unsafe in a
> > kprobe and necessarily safe in a timer probe?
>
> Because timer probes can run in process context which means they sleep,
> can be scheduled, take as long as they want? Whereas a kprobe might be
> in the middle of a task switch.
>
> > > Another acceptable solution would be to have a way to automatically
> > > defer printing and sorting of arrays to a more acceptable time [...]
> > > but the current printing syntax would need significant changes.
> >
> > This is worth further investigation, but of course has its own
> > complications.  These include concurrency: this would either require
> > locking the to-be-sorted/printed arrays until the printing coroutine
> > runs, or suffer the loss of coherence, or a potentially large
> > array-snapshot.
>
> They main complication I see is
> printf("my array is\n")
> print(@hist_log(foo))
> Doesn't do what you expect. Unless we defer all output...
>
> >
> > > [...]  I am in no way advocating eliminating MAXACTION. Just
> > > replacing it with a more flexible function that know what context we
> > > are in (kprobe, timer, end probe, etc).
> >
> > OK, but you still need to justify your belief that this more flexible
> > function can safely have drastically different values in those
> > different contexts.
>
> Seriously? OK. I checked it.
>
> There is no logical reason why begin or end probes need have any time
> limit on them. Other than to prevent infinite loops, because when the
> module is loaded, it ties up resources.
>
> Currently I see timer events are implemented as kernel timers. These are
> softirqs and would have some time limits and cannot sleep of course.
>
> I took the C code from a simple script and ripped out the kernel timers
> and replaced them with work queues.  Then I slept for 10 seconds in the
> timer events. And they worked fine. And I summed all the numbers from 1
> to 10 billion (which took about 25 seconds) in every timer event and it
> worked fine.  And I did the same in probe end. And it worked fine.  And
> I tried it all at once. And it worked too.
>
> > [...]
> > Sorry, I could not interpret "something less than infinite" as an
> > endorsement of deadlines.
>
> What's the MAXACTION equivalent of "2 minutes"  I'm guessing it's a very
> large number.
>
> Martin
>
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]