This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] [BZ 14417] Document condvar synchronization

From: Torvald Riegel <triegel at redhat dot com>
To: Jakub Jelinek <jakub at redhat dot com>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>
Date: Wed, 26 Sep 2012 18:35:13 +0200
Subject: Re: [RFC] [BZ 14417] Document condvar synchronization
References: <1348230886.3374.3162.camel@triegel.csb> <20120926155534.GN26330@sunsite.ms.mff.cuni.cz>

On Wed, 2012-09-26 at 17:55 +0200, Jakub Jelinek wrote:
> On Fri, Sep 21, 2012 at 02:34:46PM +0200, Torvald Riegel wrote:
> > +   In turn, this allows the implementation to not have to distinguish between
> > +   real and spurious wake-ups for FW, nor to have to maintain classes of
> > +   waiting threads based on which S1 the respective W1s happened before.  If
> > +   a signal is available (which can be determined based on the three shared
> > +   counters), the first waiting thread able to wake-up (spuriously or
> > +   otherwise) and acquire __lock will simply consume the signal.
> > +   Example execution (A, and B are waiting threads, A:W1 happens-before S1
> > +   happens-before A:W2): A:W1, A:FW, S1, B:W1, B:FW, B:FW returns (spurious),
> > +   B:W3, A:FW returns (real), A:W2.  B:FW can return spuriously due to a
> > +   third waiting thread executing W1 between B:W1 and B:FW, for example.
> 
> How?  B:W3 shouldn't happen, as B's seq will be __wakeup_seq after S1                                                                              
> incremented it, and there are no further S1 events, so seq == __wakeup_seq                                                                         
> and thus it should loop (i.e. B:W2, not B:W3).                                                                                                     
>                                                                                                                                                    
> Perhaps A, B, C, D waiting threads, E signalling threads holding mutex when                                                                        
> calling S1, A:W1 -> B:W1 -> E:S1 -> C:W1 -> D:W1 -> E:S1 ?                                                                                         
> A:W1, A:FW, B:W1, B:FW, E:S1, futex wake signals say A, C:W1, C:FW, D:W1,
> D:FW,                                                                    
> E:S1, futex wake signals say C, D:FW returns (spurious, e.g. it didn't sleep                                                                       
> in the kernel yet and futex returned because __futex changed), D:W3 (D's seq                                                                       
> is 1, __wakeup_seq 2), C:FW returns (real), C:W3 (C's seq is 1, __wakeup_seq                                                                       
> 2, __woken_seq at the start 1), A:FW returns (real), A:W2 (A's seq is 0,                                                                           
> __wakeup_seq 2, but __woken_seq is 2 at this point).                                                                                               
> So that would be what the BZ is about, one later pthread_cond_signal                                                                               
> awakening more than one "new" thread if there were enough                                                                                          
> pthread_cond_signal calls that should have awaken "old" threads.                                                                                   

Right, my counter-example was incorrect.  We don't need B in your
example though, but we do need two waiters between the two signals and
the second signal, so that we get enough "interference" on the counters
too (I always had a second waiter in mind for the spurious wakeup, but
forgot about the counters when writing the example).

Irrespective of the counter example, the root cause remains the same:
We don't track waiter--signal relationships and instead just count the
number of signals, and we don't make a distinction between spurious
futex wake-ups and real wake-ups.  To adhere to the stronger guarantees
and respect happens-before in both directions (so, not just the lower
bound currently required in the POSIX spec), we need some kind of
ordering of waiters or waiter--signaler grouping.

> If Austin group says that the glibc behavior is not kosher, we can always
> drop all traces of __woken_seq and live with spurious wakeups, but I hope
> we don't need to do that.

If we go that route, we can probably drop most of the counters and just
depend on the futex queues (which respect happens-before).

However, what this means in turn is that we will expose more spurious
wake-ups to programs.  Correct programs must be able to handle that, but
it's unclear to me in how many incorrect programs this will trigger
errors.  Second, I have no numbers on which performance degradation this
might mean for correct programs (there's at least one more lock
acquisition on the way to detecting a spurious wake-up).  I hope that
Austin Group will provide more background information on these
trade-offs when it clarifies the spec.

I haven't thought this through fully, but we might also be able to avoid
the spurious wake-ups and still respect happens-before in both
directions.  This will involve more bookkeeping though, I believe; I
can't yet say anything about the performance difference this might make.

Note that C++11's specification of cond vars is also not 100% clear; an
LWG issue has been created (I don't have the number yet), and I hope
that this will be discussed at the upcoming meeting.  The question is
whether the total order that C++11 requires for the cond var operations
(see 30.5.4) is supposed to be consistent with happens-before.  For some
people I talked to, clarifying this would be close to just an editorial
change, for others, it's more controversial.

Torvald

Follow-Ups:
- Re: [RFC] [BZ 14417] Document condvar synchronization
  - From: Siddhesh Poyarekar

References:
- [RFC] [BZ 14417] Document condvar synchronization
  - From: Torvald Riegel
- Re: [RFC] [BZ 14417] Document condvar synchronization
  - From: Jakub Jelinek

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]