This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: language choices for aggregation
- From: Martin Hunt <hunt at redhat dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: "systemtap at sources dot redhat dot com" <systemtap at sources dot redhat dot com>
- Date: Mon, 02 May 2005 09:34:42 -0700
- Subject: Re: language choices for aggregation
- Organization: Red Hat Inc.
- References: <20050501201652.GC15269@redhat.com>
On Sun, 2005-05-01 at 16:16 -0400, Frank Ch. Eigler wrote:
> As background, recall that dtrace aggregations are special objects
> (usually vectors) whose values track statistics of a given expression.
> These statistics can be incrementally computed without locking, making
> them more efficient than global integers even to just count events.
> It gets even better efficiency-wise when tracking averages,
> histograms, etc.
I assume the efficiency increases are due to using per-cpu data then
combining that at probe exit?
If the aggregation is a vector, we currently aren't going to see any
efficiency increase because the map will have to be locked. I have ideas
for improving maps in the future so this probably won't always be true.
You would never track averages. It is easier to track count and sum then
compute the average at probe exit. While you're at it, you might as well
track min and max. It is cheap to track them all. That's what the maps
do now.
So then anything that uses "<<<" is an aggregate that tracks count,
sum, min, and max. Histograms are more complex and will need to be
declared.
> (2) Or, duplicating the dtrace limit that such statistics objects must be
> global (not probe- or function-local), we could put the declaration
> portion into the "global" block:
>
> global AGGR (var [,ARGS])
> probe foo { ... var <<< expr ... }
> probe end { trace var }
>
> This would allow us to track multiple aggregates (say count, average,
> and histogram) of the same value by repetitive declarations:
>
> global count(var), sum(var), histogram(var,10,0,1000)
This seems reasonable, except that you wouldn't need to declare count and sum.
Then at probe exit you get something like
var: count=1234 sum=134059 avg=108.6 min=5 max=311
0 **
10 **********
20 ****
30 ********
40 ************
...
Martin