This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Fixed PR13146 by not allowing memory allocations to sleep (Was: [SCM] systemtap: system-wide probe/trace tool branch, master, updated. release-1.6-151-g8e794e9)

From: David Smith <dsmith at redhat dot com>
To: Mark Wielaard <mjw at redhat dot com>
Cc: Jim Keniston <jkenisto at linux dot vnet dot ibm dot com>, Josh Stone <jistone at redhat dot com>, systemtap at sourceware dot org
Date: Wed, 05 Oct 2011 08:43:54 -0500
Subject: Re: Fixed PR13146 by not allowing memory allocations to sleep (Was: [SCM] systemtap: system-wide probe/trace tool branch, master, updated. release-1.6-151-g8e794e9)
References: <20110901143940.13672.qmail@sourceware.org> <1317135124.3361.22.camel@springer.wildebeest.org> <4E82481E.8060502@redhat.com> <1317164453.3979.9.camel@localhost> <1317211902.3438.14.camel@springer.wildebeest.org>

On 09/28/2011 07:11 AM, Mark Wielaard wrote:

> On Tue, 2011-09-27 at 16:00 -0700, Jim Keniston wrote:
>> On Tue, 2011-09-27 at 15:03 -0700, Josh Stone wrote:
>>> On 09/27/2011 07:52 AM, Mark Wielaard wrote:
>>>> I suspect that is caused because the context is one of the larger
>>>> allocations at systemtap_module_init time. And because we are no longer
>>>> allowed to GFP_WAIT it is more likely to fail now. Would it be
>>>> possible/make sense to allow GFP_WAIT for allocations made from context
>>>> like systemtap_module_init() that may sleep because they are made from
>>>> user context?
>>>
>>> I agree, those contexts which can sleep, should.  Not only does this
>>> make it more likely we'll get the memory we want, but also makes us
>>> better citizens with the rest of the kernel.
>>>
>> I haven't seen this explicitly mentioned wrt this thread or PR13146, but
>> uprobes and uretprobe handlers (which are called from the utrace
>> report_signal callback) can sleep.
> 
> OK, I have not gone through all allocation points yet. But I did
> introduce _gfp variants for the current _stp alloc.c functions. And use
> them with GFP_KERNEL from module_init for the setup of the context, maps
> and stats now. Since that is where I saw the spurious failures most
> often and because that was easiest to proof "safe" (but please do double
> check the commit, in case I did something stupid). Tested against
> 2.6.35.14-96.fc14.x86_64 and 2.6.9-101.ELsmp i686.
> 
> commit f816f9be4cdf8e4dcac0b11fad8924ab389449ee
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Wed Sep 28 13:18:18 2011 +0200
> 
>     Introduce gfp_mask variants of _stp allocation functions.
>     
>     * runtime/alloc.c: Add _stp_kmalloc_gfp(), _stp_kzalloc_gfp() and
>       _stp_kmalloc_node_gfp() which take a gfp_mask in case you really
>       know what you are doing.
>     * translate.cxx (emit_module_init): Use with GFP_KERNEL for context
>       alloc.
>     * runtime/map.c: Use for init and new functions called at
>       module_init time.
>     * runtime/stat.c: Likewise.
> 
> We could now use that in other places if we really know it is safe.
> 
> Note that_stp_alloc_percpu() calls __alloc_percpu() which may sleep and
> always uses GFP_KERNEL. I audited the places where we call it and they
> all seem safe, but please double check current practice and be careful
> when calling _stp_alloc_percpu() from random contexts.

Mark, thanks for finding and fixing this.  I started seeing some of
those messages right before I left for vacation and didn't have time to
look into it.

> BTW. Why is our default gfp_mask:
> #define STP_ALLOC_FLAGS ((GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN) \
>                          & ~__GFP_WAIT)
> And not just GFP_ATOMIC or GFP_NOWAIT?


Good question.  I added the '& ~__GFP_WAIT' part in the commit that
started this, and I was unsure of the logic of the rest.  I didn't have
time to look too much in the history of those flags and how we used
them.  It is possible the rest of those flags were done that way for
reasons that made sense back in the early 2.6 kernel days.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]