This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Notes from the systemtap BOF


All --

Enclosed are my notes from the systemtap BOF on Friday evening.  Feel
free to comment, update, etc if you feel I missed anything.

-- charles

July 22, 2005 - Systemtap Birds of a Feather (BOF) session

Initial set of open questions proposed by Vara:

Is probe overhead using breakpoint methodology acceptable?

What areas of the kernel are interesting to instrument?

What facilities are needed from Systemtap for effective instrumentation?

How do you maintain instrumentation code with kernel changes?

General request for kernel experts to help in writing instrumentation.

Any other issues?

========

Discussion that followed:

What is difference (performance number) between kprobe and djprobe?
kprobe ~1 microsecond, djprobe is ~50 nanosecond on 3G Pentium 4.

General questions on how djprobe is implemented and limitations.
kprobe is an int/exception, djprobe is a jump

Example limitations to djprobe:
1) Jmps are multiple bytes, need to watch for branches to the middle of
the old code
2) Insertion in "exception areas" like copy_from_user, when emulating
the instructions that could fault.

see lkst.sourceforge.net for slides and details

What about makers in the kernel. Definition of a marker: A note that
something important happened here. Could be defined externally.  For
example, there are ~60k printk in the kernel and the printk's can
be viewed as a form of marker in the kernel. Would the kernel community
accept markers? LTT is fairly stable so what needs to be marked is
fairly well known.

see http://www.opersys.com/LTT for information about LTT.

One suggestion is a pragmatic approach. For the areas that don't change,
don't need markers (for example, syscall is stable). For things that
are changing, use markers (for example, scheduler).

A request was made for a standard way to specify markers so that
different subsystems can use the same methods. Each subsystem has its
own set of hacks/tracing facility which makes it hard to debug
problems that cross subsystems. For example, tracing an io problem
that is affecting vm. io and vm have different methods for gathering
data which makes it hard to correlate.

Project nana uses assertion as a marker. Potentially could take
assertion
from source code, and translated into dynamic probes (kprobe) which can
be inserted to verify asserts on a running system.
see http://www.gnu.org/software/nana/nana.html for more details about
nana

Value for a C api in the kernel to provide tracepoints. As a macro which
maintainers can use. Basically provide a tracing API. Karim has a
proposal
to be discussed after BOF (see below).

Conversation had a tendency to return to markers and the need thereof,
so it is clear that systemtap needs to look into this further.

In kernel, filtering is important to minimize size of data. Want general
filtering framework (runtime).  The systemtap language allows for
"if (x) then generate data" and that code is compiled into C. This
seemed "good enough" for people, at least until they can try it.

For iotrace, need the function arguments at the point of return. Want
the ability to look at locals/func arguments when returning. With
current
jretprobes, the stack is already gone when the return probe is
triggered.
Want something to make it "easy" to see what the arguments were
that also handles smp issues.

Potential useful places for tapsets and probes:

tapset for sysfs

blockio layer information. Infrastructure to pass around data. For
example,
open file (large file) and kernel translates to block on disk. At bio
layer
that block is all you see, want ability to correlate that block back to
file name. Associative arrays could be useful for this. Large reserve
buffer
area, size of correlation data can be an issue.

Could use LTTV to display different kinds of data using XML file for
description of data.

==========
Discussion of markers which happened after the systemtap BOF. Presented
by Karim Yaghmour.

Consider a file, for example, linux/kernel/sched.c

At the top of files:
Declaration
one per EVENT_ID per FILE
ev_trace_declare(EVENT_ID, EVENT_TAG, EVENT_NR, EVENT_DEFAULT,
PARAM(TYPE),...)

EVENT_ID -  unique identifier for a "group" of events
EVENT_TAG - name/string
EVENT_NR - number of fields to be passed in. Maximum.
EVENT_DEFAULT - on or off

Actual usage:
ev_trace(...)

Concern was that splitting declaration from usage may still be an issue.

Question, can it get down to a single MACRO?

GEN_TRACE(uniqueid, param1, param2, ...)

or N macros for N arguments:
GEN_TRACE1(uniqueid, param1)
GEN_TRACE2(uniqueid, param1, param2)

The unique id could be a pointer to a unique string made up of
file/line/function name or something else (potentially generated inside
the macro itself).

Next steps:

Talk to compiler people and see if variable arg macro can be done where
you can do something (like typeof()) to each parameter.

Can the "typeof()" primitive from gcc help with indicating the types of
the rest of the params? Type information is useful to automate viewing
of the data (like lttv).

How do you enable/disable behavior on a per file basis?

How do you enable/disable at runtime? Can systemtap be used to access
the marker information and apply probes at the corresponding spot and
then dump values? Seems likely given systemtap will be reading dwarf
symbolic information to do something similar.

What should the macro (or macros?) actually look like? Should it
generate code? Or just add information to a special section in the
resulting binary (a .maker section, similar to a .bss or .text section).
Potentially, marker information is a lot like debug information. Could
be stripped/split from the shipped binaries as long as there is a way to
get the information when necessary (like dwarf symbolic information).

Need to talk to the compiler people to see how this can work.

Could implement much of this (macro, generation of descriptor for
lttv, etc) for "proof of concept" without requiring kernel changes.

Discussion to continue on systemtap mailing list.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]