This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH -tip v3&10 07/18] x86: Add text_poke_smp for SMP cross modifying code
- From: Mathieu Desnoyers <mathieu dot desnoyers at efficios dot com>
- To: Masami Hiramatsu <mhiramat at redhat dot com>
- Cc: Ingo Molnar <mingo at elte dot hu>, Frederic Weisbecker <fweisbec at gmail dot com>, Ananth N Mavinakayanahalli <ananth at in dot ibm dot com>, lkml <linux-kernel at vger dot kernel dot org>, systemtap <systemtap at sources dot redhat dot com>, DLE <dle-develop at lists dot sourceforge dot net>, Jim Keniston <jkenisto at us dot ibm dot com>, Srikar Dronamraju <srikar at linux dot vnet dot ibm dot com>, Christoph Hellwig <hch at infradead dot org>, Steven Rostedt <rostedt at goodmis dot org>, "H. Peter Anvin" <hpa at zytor dot com>, Anders Kaseorg <andersk at ksplice dot com>, Tim Abbott <tabbott at ksplice dot com>, Andi Kleen <andi at firstfloor dot org>, Jason Baron <jbaron at redhat dot com>
- Date: Tue, 2 Mar 2010 19:48:15 -0500
- Subject: Re: [PATCH -tip v3&10 07/18] x86: Add text_poke_smp for SMP cross modifying code
- References: <20100225133342.6725.26971.stgit@localhost6.localdomain6> <20100225133438.6725.80273.stgit@localhost6.localdomain6> <20100225153305.GC12635@Krystal> <4B8745AC.2070702@redhat.com>
* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> > * Masami Hiramatsu (mhiramat@redhat.com) wrote:
> [...]
> >> +
> >> +/*
> >> + * Cross-modifying kernel text with stop_machine().
> >> + * This code originally comes from immediate value.
> >> + */
> >> +static atomic_t stop_machine_first;
> >> +static int wrote_text;
> >> +
> >> +struct text_poke_params {
> >> + void *addr;
> >> + const void *opcode;
> >> + size_t len;
> >> +};
> >> +
> >> +static int __kprobes stop_machine_text_poke(void *data)
> >> +{
> >> + struct text_poke_params *tpp = data;
> >> +
> >> + if (atomic_dec_and_test(&stop_machine_first)) {
> >> + text_poke(tpp->addr, tpp->opcode, tpp->len);
> >> + smp_wmb(); /* Make sure other cpus see that this has run */
> >> + wrote_text = 1;
> >> + } else {
> >> + while (!wrote_text)
> >> + smp_rmb();
> >> + sync_core();
> >
> > Hrm, there is a problem in there. The last loop, when wrote_text becomes
> > true, does not perform any smp_mb(), so you end up in a situation where
> > cpus in the "else" branch may never issue any memory barrier. I'd rather
> > do:
>
> Hmm, so how about this? :)
> ---
> } else {
> do {
> smp_rmb();
> while (!wrote_text);
> sync_core();
> }
> ---
>
The ordering we are looking for here are:
Write-side: smp_wmb() orders text_poke stores before store to wrote_text.
Read-side: order wrote_text load before subsequent execution of modified
instructions.
Here again, strictly speaking, wrote_text load is not ordered with respect to
following instructions. So maybe it's fine on x86-TSO specifically, but I would
not count on this kind of synchronization to work in the general case.
Given the very small expected performance impact of this code path, I would
recomment using the more solid/generic alternative below. If there is really a
gain to get by creating this weird wait loop with strange memory barrier
semantics, fine, otherwise I'd be reluctant to accept your proposals as
obviously correct.
If you really, really want to go down the route of proving the correctness of
your memory barrier usage, I can recommend looking at the memory barrier formal
verification framework I did as part of my thesis. But, really, in this case,
the performance gain is just not there, so there is no point in spending time
trying to prove this.
Thanks,
Mathieu
> >
> > +static volatile int wrote_text;
> >
> > ...
> >
> > +static int __kprobes stop_machine_text_poke(void *data)
> > +{
> > + struct text_poke_params *tpp = data;
> > +
> > + if (atomic_dec_and_test(&stop_machine_first)) {
> > + text_poke(tpp->addr, tpp->opcode, tpp->len);
> > + smp_wmb(); /* order text_poke stores before store to wrote_text */
> > + wrote_text = 1;
> > + } else {
> > + while (!wrote_text)
> > + cpu_relax();
> > + smp_mb(); /* order wrote_text load before following execution */
> > + }
> >
> > If you don't like the "volatile int" definition of wrote_text, then we
> > should probably use the ACCESS_ONCE() macro instead.
>
> hm, yeah, volatile will be required.
>
> Thank you,
>
>
> --
> Masami Hiramatsu
> e-mail: mhiramat@redhat.com
>
>
>
--
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com