This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] Linux Kernel Markers
- From: Martin Bligh <mbligh at google dot com>
- To: Vara Prasad <prasadav at us dot ibm dot com>
- Cc: prasanna at in dot ibm dot com, Andrew Morton <akpm at osdl dot org>, "Frank Ch. Eigler" <fche at redhat dot com>, Ingo Molnar <mingo at elte dot hu>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>, Paul Mundt <lethal at linux-sh dot org>, linux-kernel <linux-kernel at vger dot kernel dot org>, Jes Sorensen <jes at sgi dot com>, Tom Zanussi <zanussi at us dot ibm dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, Michel Dagenais <michel dot dagenais at polymtl dot ca>, Christoph Hellwig <hch at infradead dot org>, Greg Kroah-Hartman <gregkh at suse dot de>, Thomas Gleixner <tglx at linutronix dot de>, William Cohen <wcohen at redhat dot com>, ltt-dev at shafik dot org, systemtap at sources dot redhat dot com, Alan Cox <alan at lxorguk dot ukuu dot org dot uk>
- Date: Tue, 19 Sep 2006 12:26:32 -0700
- Subject: Re: [PATCH] Linux Kernel Markers
- Domainkey-signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=received:message-id:date:from:user-agent: x-accept-language:mime-version:to:cc:subject:references:in-reply-to: content-type:content-transfer-encoding; b=YW0hYJlF/9LkXJ33PIZmZEyPPkN9DFE7sfbMbNJWUse3168qG2PgGmdCmk1M3Ctyr wCZJ0+tmYi047XZ1aMhPg==
- References: <20060918234502.GA197@Krystal> <20060919081124.GA30394@elte.hu> <451008AC.6030006@google.com> <20060919154612.GU3951@redhat.com> <4510151B.5070304@google.com> <20060919093935.4ddcefc3.akpm@osdl.org> <45101DBA.7000901@google.com> <20060919063821.GB23836@in.ibm.com> <45102641.7000101@google.com> <4510413F.2030200@us.ibm.com>
Vara Prasad wrote:
Martin Bligh wrote:
[...]
Depends what we're trying to fix. I was trying to fix two things:
1. Flexibility - kprobes seem unable to access all local variables etc
easily, and go anywhere inside the function. Plus keeping low overhead
for doing things like keeping counters in a function (see previous
example I mentioned for counting pages in shrink_list).
Using tools like systemtap on can consult DWARF information and put
probes in the middle of the function and access local variables as well,
that is not the real problem. The issue here is compiler doesn't seem to
generate required DWARF information in some cases due to optimizations.
It seems difficult to seperate those two from each other. If the
subsystem you're relying on doesn't work, then ....
The other related problem is when there exists debug information, the
way to specify the breakpoint location is using line number which is not
maintainable, having a marker solves this problem as well. Your proposal
still doesn't solve the need for markers if i understood correctly.
It could, but I think we're better off with the markers, yes.
2. Overhead of the int3, which was allegedly 1000 cycles or so, though
faster after Ingo had played with it, it's still significant.
The reason Kprobes use breakpoint instruction as pointed out by Prasanna
is, it is atomic on most platforms. We are already working on an
improved idea using jump instruction with which overhead is less than
100 cycles on modern CPU's but it has some limitations and issues
related to preemption and SMP.
You can get a glimpse of some of the issues here
http://sourceware.org/ml/systemtap/2006-q3/msg00507.html
http://sourceware.org/ml/systemtap/2005-q4/msg00117.html
For more details do a search for djprobe in the systemtap mailing list
(sorry i am not able to find few threads to summarize all the issues).
"This djprobe is NOT a replacement of kprobes. Djprobe and kprobes
have complementary qualities. (ex: djprobe's overhead is low, and
kprobes can be inserted in anywhere.)". Hmm. that seems problematic.
From what I was describing for function replacement, we could do an NMI
IPI to everyone, and lock them in there whilst we insert the probe, but
it's a bit sucky.
Here is the algorithm djprobes uses to
IA
| [-2][-1][0][1][2][3][4][5][6][7]
[ins1][ins2][ ins3 ]
[<- DCR ->]
[<- JTPR ->]
ins1: 1st Instruction
ins2: 2nd Instruction
ins3: 3rd Instruction
IA: Insertion Address
JTPR: Jump Target Prohibition Region
DCR: Detoured Code Region
The replacement procedure of djpopbes is the following (i have
simplified for readability the actual steps djprobes uses)
(1) copying instruction(s) in DCR
(2) putting break point instruction at IA
(3) make sure no cpu's have replacing instructions in the cache to avoid
jump to the middle of jmp instruction
(4) replacing original instruction(s) with jump instruction
As you can see from the above your suggestion is very similar to the
djprobes hence i believe all the issues related to djprobes will be
valid for yours as well.
The hooking seems very similar, yes, perhaps I can be lazy and just
steal djprobes for this. The difference is that if we just replace the
whole function, we can just shove arbitrary changes into functions, and
do whatever we please. Plus we don't have to worry about locating
internal variables, etc.
M.