This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Discussion at Linux Foundation Japan Symposium


On Mon, Jan 12, 2009 at 02:04:01PM -0500, Jason Baron wrote:
> 
> We have been actively looking at and adding tracepoints to the lttng
> kernel tree via the ltt-dev list to support Systemtap. These tracepoints are
> being added at "key" kernel points in the fs, vm, scheduler, and other
> subsystems. Unfortunately, we just realized that these tracepoints are not
> going to be proposed for a merge until lttng is proposed for merge. Systemtap
> can not be held up by this.

Huh?  Last I checked Systemtap didn't support tracepoints at all.  Did
I miss something?

And what what do you mean by "adding tracepoints to support
Systemtap"?  Do you mean that these would help you write better
tapsets, for when Systemtap could support tracepoints?

Also, the trace points won't necessarily be helpd up for merge until
lttng is proposed for merge.  What is necessary is a way to access
those tracepoints without needing some big, harry, complex, userspace
package (whether it is called Lttng or Systemtap), since said packages
often are written with massive distro-dependencies, or are written in
C++ so kernel developers have a hard time customizing/fixing them to
meet their needs, and so on.

So what Linus Torvalds and other senior kernel developers proposed at
the Kernel Summit was a simple debugfs/proc interface which would
allow individual activation of a tracepoint/marker, and which would
dump out the data collected by that marker as a simple text file
accessed via a pseudo-filesystem.  This would be the "in-mainline
user" of the markers/tracepoints, and would guarantee that tracepoints
could be made *useful* by kernel developers using grep, awk, and perl
of that text file.  Simple filtering for bandwidth reasons might be
done via debugfs knobs, only for the 99.9% common cases.

> Therefore, I was thinking of proposing 100+ tracepoints that are
> currently in the lttng tree (and not upstream, but many have already
> been reviewed upstream), on lkml.

Linus has basically said at the Kernel Summit that he was not going to
accept new markers until there was a way to make sure that they could
actually be made *useful* for real, live kernel maintainers, via this
simple text interface.  There were some questions about whether text
or a compressed binary would be used to ship the log to userspace, but
in the latter case, a simple .c file shipped with the kernel sources
in the examples directory must be all that would be necessary to
generate the text stream that would then be processed via grep/awk ---
not a massive out-of-tree C++ program.

I was able to sneak in some markers for ext4, but that's primarily
because it was maintainer's discretion and ext4 isn't in widespread
use and is in late-development stage, so Linus doesn't pay as close
attention.  :-) However, if you are going to try to get 100+
tracepoints into the core kernel, that *will* draw notice, and the
first question people will ask is "what's the in-tree consume of
tracepoints".

I pinged Steven Rostedt at Red Hat, and he indicated that this was
still on his todo list.  So my recommendation to you would be to reach
out to Steven Rostedt, and see if you can help with trying to get the
"simple text output" enhancements to ftrace completed so it can get
merged into mainline.  There has already been general approval of that
game plan for at the Kernel Summit, so this is basically a question of
"Show Me The Code".  Once this is done, getting the tracepoints you
want into the kernel should be relatively straightforward.

> If we also propose Systemtap specific set of 
> markers to interface, with these tracepoints, then Systemtap will work out of
> the box with no debuginfo, no gcc changes, and be effective immediately to 
> filter ext4 debug information.

That assumes that SystemTap can access tracepoints, but I assume
that's a Small Matter of Programming.  :-)

> Longer term, we can look at merging markers into tracepoints, having
> Systemtap directly interface with tracepoints, and merging
> utrace/probes. This proposal makes Systemtap immediately more useful on 
> upstream kernels, while longer term issues are addressed. thoughts?

Markers are probably just going to disappear.  Most of the markers
that were in the core kernel have already disappeared; all that's left
was a handful of Markers arch-specific code, the KVM subsystem, and
the ext4 subsystem.

I don't think you'll be able to get the tracepoints into the main
kernel until we get a way to access tracepoints directly via debugfs
and getting exported output files via either a simple .c file
accessing a binary log output, or simply using "cat
/debugfs/..../foo.txt", but I suspect that can happen very quickly.

I'll admit that once this is done, I may end up using the plain text
output mechanism far more often that SystemTap, since it will be more
conveient that creating stap scripts like this:

probe kernel.mark("ext4_sync_file")
{
	t = gettimeofday_ms();
	printf("%d.%d:ext4_sync_file: dev %s datasync %d ino %d parent %d\n",
		      t / 1000, t % 1000, $arg1, $arg2, $arg3, $arg4)
}

... but maybe that's OK, the SystemTap developers will probably get
many fewer annoying complaints from kernel developers complaining
about how SystemTap doesn't work with 2.6.29-rc1, or how "xmlto pdf"
or elfutils doesn't work exactly the same (or not at all) on
non-RedHat distributions, etc.

						- Ted


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]