This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Interesting reading regarding dtrace, aggregation andbuffers...

From: Martin Hunt <hunt at redhat dot com>
To: "Spirakis, Charles" <charles dot spirakis at intel dot com>
Cc: William Cohen <wcohen at redhat dot com>, "systemtap at sources dot redhat dot com" <systemtap at sources dot redhat dot com>
Date: Thu, 16 Jun 2005 14:11:19 -0700
Subject: Re: Interesting reading regarding dtrace, aggregation andbuffers...
Organization: Red Hat Inc.
References: <2CB9B46A0690824693581340E23B4E10045683E1@scsmsx401.amr.corp.intel.com>

On Thu, 2005-06-16 at 13:22 -0700, Spirakis, Charles wrote:

> Is this why they use the XXX = count() syntax (and other aggregation
> specific functions)? So they can store part of the information in kernel
> space, flush when appropriate to user space, then do the final
> aggregation in user space? 

I don't think the internals have anything to do with the syntax.

> Note how they define aggregation at the top
> (which gives them this ability: f(f(x1) U f(x2) U ...) == f(x1 U x2
> U...) ).

That basically defines an aggregation.  It is data that can be collected
per-cpu and later combined.

> As for the buffering methodology:
> http://docs.sun.com/app/docs/doc/817-6223/6mlkidlho?a=view
> 
> By default, they are per-cpu, double buffered. They do provide a lot of
> flexibility in how the buffers are managed.

When I was talking about "tagged data" a while ago I was doing this
because I wanted to be able to send data in specific formats so that it
could be processed in user space.  That is not going to make it into the
initial release.

For basic aggregations (no keys), there is no need for user-space
storage.  For aggregated maps (which I've been calling per-cpu maps) we
could eventually use user-space storage. It wouldn't be hard; when their
internal storage gets full, just dump the aggregated stats.

We also have associative arrays (maps) which are treated like global
variables.  They can read and modified.  They don't scale like per-cpu
maps and they can't use user-space storage.

(If you are looking, I have aggregations done and am currently
documenting them. I will get those checked in then. per-cpu maps are not
yet finished.)

FYI, I have only defined two aggregations, Counter and Stat. Counter is
a per-cpu counter.  It counts events more efficiently than an atomic,
but is not appropriate when you want to be reading it often.  Stats
handles counts, sums, min, max, average and histograms. 

Martin

References:
- Interesting reading regarding dtrace, aggregation and buffers...
  - From: Spirakis, Charles

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]