This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: futex and soft lockup
Actually I realized you have to be right pointing to the kernel direction. :-)
I was blinded by the fact that older glibc didn't have the problem and forgot about the basic concepts. I also didn't realize the kernel is actually quite active on the futex area. I am trying some latest kernel futex fixes and it seems promising.
The newer glibc might just be triggering some different kernel path or simply being faster (so easier to trigger some race conditions).
thanks for the tip.
-gong
----- Original Message ----
From: Américo Wang <xiyou.wangcong@gmail.com>
To: Gong Cheng <chengg11@yahoo.com>
Cc: libc-help@sourceware.org
Sent: Tuesday, October 20, 2009 2:35:16 AM
Subject: Re: futex and soft lockup
On Tue, Oct 20, 2009 at 4:55 AM, Gong Cheng <chengg11@yahoo.com> wrote:
> Hi,
> I am running glibc-2.5-34.x86_64.rpm (for CentOS) on top of a 2.6.31 (tried 2.6.30 too) kernel, and I am consistently seeing system soft lockups like the following:
>
> BUG: soft lockup - CPU#0 stuck for 61s! [<my program>:3068]
> <snip>
> Call Trace:
> [<ffffffff8130e8d6>] ? _spin_lock+0x16/0x40
> [<ffffffff8105fe85>] ? futex_wait_setup+0x75/0x100
> [<ffffffff81060109>] ? futex_wait+0xf9/0x270
> [<ffffffff8108c80b>] ? zone_statistics+0x5b/0x90
> [<ffffffff810619fb>] ? do_futex+0xbb/0xcb0
> [<ffffffff81082f98>] ? ____pagevec_lru_add+0x138/0x150
> [<ffffffff810317ac>] ? update_curr+0x6c/0xc0
> [<ffffffff810831b1>] ? __lru_cache_add+0x71/0xb0
> [<ffffffff81083204>] ? lru_cache_add_lru+0x14/0x30
> [<ffffffff8130eda1>] ? _spin_unlock+0x11/0x40
> [<ffffffff8108f0de>] ? do_wp_page+0x28e/0x7b0
> [<ffffffff81090e3a>] ? handle_mm_fault+0x59a/0x7c0
> [<ffffffff8130ea12>] ? _spin_lock_irqsave+0x22/0x50
> [<ffffffff8130ee63>] ? _spin_unlock_irqrestore+0x13/0x40
> [<ffffffff81062680>] ? sys_futex+0x90/0x150
> [<ffffffff81029417>] ? do_page_fault+0x187/0x2d0
> [<ffffffff8100bceb>] ? system_call_fastpath+0x16/0x1b
>
> previously when running glibc-2.5.18 I didn't have this problem. In fact, if I switch back to 2.5.18 while keeping everything else the same, the problem immediately stops.
>
> My program uses pthread and futex extensively. If I run the program in single-threaded mode, then I don't have the issue.
>
> I am aware I am not providing a lot of information here, but just want to quickly check if this issue is known to anyone here?
> Also in general, is it a bad idea to combine 2.5-34 glibc with the latest kernel?
>
> I'd appreciate any tips on this issue!
This is more than a kernel problem.. :)
Kernel is not supposed to have a 'soft lockup' no matter how you use futex in
user-space. Would mind to try the latest git kernel with glibc-2.5-34?
Thanks.