This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

pthread_cond_wait() hangs when cancelled (sometimes)


Hi,

I've stumbled upon a problem, that happens when I do a
pthread_cond_wait(), and then cancel the thread that is waiting for the
condition. In many cases this will cause the waiting thread to hang in
__pthread_disable_asynccancel().

I've been searching around for similar issues, but haven't been able find
anything. I'm thus hoping that someone might be able to help me debug
this, and shed some light upon if I'm doing something wrong, or whether
this is a bug in glibc.

My system:
An embedded AMD-Fusion platform (dual-core) Linux 3.2.9-rt16 kernel OSELAS
cross toolchain 2011.11.3
(platform-i686-unknown-linux-gnu-gcc-4.6.2-glibc-2.14.1-binutils-2.21.1a)
- with the "Fix exception table for i386 pthread_cond_wait" patch applied
(http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=55a051c985c3e7965a2
f5dd5f762ac2737adae01)

I've been trying to create a small self-contained example, but haven't
been able to reproduce it (yet).

What happens is the following:
I have a number of threads doing a pthread_cond_wait() (all threads,
mutexes and condition variables use default attributes and parameters - no
PRIO_INHERIT, SCHED_FIFO or similar used in the threads in question):

    size_t indexAtStart = data.getIndex();
    pthread_cleanup_push((void(*)(void*))pthread_mutex_unlock, (void *)
&lock);
    pthread_mutex_lock(&lock);
    while(indexAtStart == data.getIndex()) {
        pthread_cond_wait(&cond, &lock);
    }
    pthread_cleanup_pop(1);

Data comes in at 500Hz, and does the following (in a separate
data-collector thread):

    data.add(newData);  //also increments index
    pthread_mutex_lock(&lock);
    pthread_cond_broadcast(&cond);
    pthread_mutex_unlock(&lock);

This works just fine under normal circumstances, but when I try to cancel
one of the waiting threads, pthread_cancel() returns just fine, but
sometimes (about ~20% of the time), the waiting threads will be stuck in
pthread_cond_wait(). Stack trace:

    #0  0xb7d8b2f7 in __pthread_disable_asynccancel (oldtype=<value
optimized out>) at cancellation.c:97
    #1  0xb7d88886 in pthread_cond_wait@@GLIBC_2.3.2 ()
        at
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:176

I've been looking at cancellation.c:97, and it seems as if it gets stuck
in an lll_futex_wait() for the cancelhandling variable to be updated with
the CANCELED_BIT (only CANCELING_BIT seems to be set). The only place this
flag seems to be set (that I've been able to find at least) is at the very
end of pthread_cancel(), where the cancelhandling variable is updated with
both CANCELING_BIT and CANCELED_BIT. I've tried putting a breakpoint on
pthread_cancel() in gdb, and it seemed as if it just skipped the line in
question - but as gdb seems to be using the assembly version of the file,
I'm not sure how much I should put into this?

I'm currently about to try out a newer version of the toolchain, but in
the meantime, if anyone has any idea about what might be going on, I'd be
very happy to hear it. Thanks in advance!

Best regards,
Simon Falsig


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]