This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: language choices for aggregation

From: Martin Hunt <hunt at redhat dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: "systemtap at sources dot redhat dot com" <systemtap at sources dot redhat dot com>
Date: Mon, 02 May 2005 09:34:42 -0700
Subject: Re: language choices for aggregation
Organization: Red Hat Inc.
References: <20050501201652.GC15269@redhat.com>

On Sun, 2005-05-01 at 16:16 -0400, Frank Ch. Eigler wrote:

> As background, recall that dtrace aggregations are special objects
> (usually vectors) whose values track statistics of a given expression.
> These statistics can be incrementally computed without locking, making
> them more efficient than global integers even to just count events.
> It gets even better efficiency-wise when tracking averages,
> histograms, etc.

I assume the efficiency increases are due to using per-cpu data then
combining that at probe exit?

If the aggregation is a vector, we currently aren't going to see any
efficiency increase because the map will have to be locked. I have ideas
for improving maps in the future so this probably won't always be true.

You would never track averages. It is easier to track count and sum then
compute the average at probe exit. While you're at it, you might as well
track min and max.  It is cheap to track them all. That's what the maps
do now. 

So then anything that uses "<<<"  is an aggregate that tracks count,
sum, min, and max. Histograms are more complex and will need to be
declared.

> (2) Or, duplicating the dtrace limit that such statistics objects must be
> global (not probe- or function-local), we could put the declaration
> portion into the "global" block:
> 
>     global AGGR (var [,ARGS])
>     probe foo { ... var <<< expr ... }
>     probe end { trace var }
> 
> This would allow us to track multiple aggregates (say count, average,
> and histogram) of the same value by repetitive declarations:
> 
>     global count(var), sum(var), histogram(var,10,0,1000)

This seems reasonable, except that you wouldn't need to declare count and sum.
Then at probe exit you get something like

var: count=1234 sum=134059 avg=108.6 min=5 max=311
0  ** 
10 **********
20 ****
30 ********
40 ************
...

Martin

Follow-Ups:
- Re: language choices for aggregation
  - From: Frank Ch. Eigler

References:
- language choices for aggregation
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]