This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: my notes from the tracing workshop
- From: Elena Zannoni <elena dot zannoni at oracle dot com>
- To: Andrew Cagney <cagney at redhat dot com>
- Cc: systemtap at sourceware dot org, frysk <frysk at sourceware dot org>
- Date: Fri, 01 Feb 2008 17:05:33 -0500
- Subject: Re: my notes from the tracing workshop
- Organization: Oracle USA Inc.
- References: <47A34AA2.5070404@redhat.com>
Thanks for the notes, Andrew. Good summary.
I posted mine here, including my slides:
http://blogs.oracle.com/ezannoni/
elena
Andrew Cagney wrote:
[The slides get published next week]
Overview
The underlying goal of the workshop was to gather information on the
current state of tracing and monitoring technology, and identify areas
of potential research and development. The Canadian Government is
looking to significantly further research in this area; and is
preparing a report.
Broadly the talks had an embedded bent, which isn't surprising given
its organizational origins in the telco industry. There was a wide
level of representation though with both large system, and deeply
embedded viewpoints being presented.
The Technology
For most talks, the assumed approach was
<probe> -> <filtering> -> <recorder> -> $LOG
then on the host; or in user land:
$LOG -> <converter> -> "DB" -> <visualization>
so I'll talk to that.
Probes
That there were two technology camps (modified kernel, and dynamic
probes), with the majority in the former group. Interestingly, the
embedded players strongly indicated that deploying the modified kernel
was acceptable (even advantageous) - the systems were permanently
running in flight-recorder mode so they were in a better position to
do postmortem analysis.
The exceptions were SystemTAP and SensorPoint (Wind River) (and on the
edge, frysk). Both SystemTAP and SensorPoint and the same basic
approaches. SensorPoint did have a djprobe like mechanism working,
and nested(?) probes (where you could specify the call chain required
to trigger the probe - it worked by watching the functions and not by
looking at backtraces); finally the ability to replace code on live
systems.
Finaly, the big and positive thing on probes was that the kernel
markers being accepted. Oracle(Elena) identified that a lacking
feature was being able to query the list of possible probe points ->
embedding markers in the code (and hopefully having them documented in
situ ????) will address this. On the other hand, I picked up a few
concerns (outside of presentations): who gets to back port this (if at
all); its an ABI, who gets to maintain it long term; and what happens
when someone refuses to accept markers in their code :-)
Filters
This is where SystemTAP and SensorPoint stood out (I think :-). Both
have the ability to filter events before pushing them to the
recorder. Using SystemTAP on the kernel markers should be a wicked
combination.
[Can I assume that, when there's a marked up kernel, SystemTAP inserts
jumps instead of traps? If fche had been giving the talk, it would
have been my question :-)]
Recorders and logs
Zzzzz.
Converters
The consistent approach was to implement some sort of converter that
could load random external file formats and load them into an internal
form.
While there seemed to be a push to standardize on log-file format, I
got the impression that it was solving the wrong problem (and others
two). Size really did matter.
"DB"
There was a strong consensus that the "internal" format of the log
data needed to be a fast light weight database; two vendors were using
sqlite for instance (TPTP the eclipse tool didn't but I suspect will
shortly). Wind River presented a discussion illustrating its advantages.
There were suggestions, and it appears a strong degree of consensus,
of standardizing a database format, so that could be shared amongst
visualization tools. I think this, and the conversion tools will
gather traction. Something SystemTAP should monitor.
Visualization.
Many visualization tools were presented (if I see another useless
full-screen snap-shot in a slide I'll scream), most built on eclipse,
but a few were not. While this is a very crowded market, there seems,
in mnsho, to still be a need for clear simple visualization tools
backed by a databse.
The quote of the day, in describing eclipse, has to be "icon diarrhea".
A few of the Talks
Me / Red Hat: SystemTAP / Frysk
(I got to do both talks).
What's the status of SystemTAP on the ARM? Ditto for Frysk.
Robert Winsiewski / IBM: Performance analys and debugging at IBM
It was as much about IBM as a few other companies Robert had worked
for; it have a general history of logging challenges in a number of
companies. Strongly in favor of the marker approach; and set that as
a theme. Two notable ideas were non-locked logging (the in-memory log
file format handled synchronization using atomic instructions); and
sharing memory logs between user and system.
Elena Zannoni / Oracle: Tracing at Oracle
Presented the challenges with using SystemTAP in a "binary only /
clean room" environment.
Beth Tibbits / IBM: Eclipse Parallel Tools Platform
Underneath they are using a consolidating process that then, in turn,
talks to a distributed collection of gdb processes (makes you cry :-);
this basic approach is described in Bevin Brett's paper on making
ladebug HPC. There's work to generalize this, see
http://scalabletools.org/
Andrew McDermott / Wind River: Developing OS-agnostic visualization
tools.
Discussed the "DB" approach for managing all that data.
Felix Burton / Wind River: Sensorpoint Technology
Wind Rivers rough equivalent to SystemTAP. Use "C" for the probes.
--
I was asked if SystemTAP is supported on arm (have e-mail address if
fche you want to contact them).