This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

__lll_mutex_unlock broken on IA64 with nptl?


AFAICT, the implementation of __lll_mutex_unlock and hence pthread_mutex_unlock is currently broken on IA64.  It's missing a release memory barrier.  Hence memory operations which logically occur while the lock is held may not become visible to other threads until after the lock is released, and hence effectively occur outside the lock.

The problem is in the ia64 version of either lowlevellock.h or atomic.h.

Pthread_mutex_unlock invokes __lll_mutex_unlock to release the low level lock.  That in turn invokes atomic_exchange (atomic.h), which invokes the intrinsic __sync_lock_test_and_set_si.  It appears that __sync_lock_test_and_set_si is undocumented and not meant to be used directly, but it is used to implement __sync_lock_test_and_set, which is documented to provide only acquire semantics.  (It uses the IA64 xchg instruction, which provides only acquire semantics.)

I have no idea what the intended memory barrier semantics for atomic_exchange are.  There should probably be acquire and release variants.  If it is intended to provide a release or full barrier, then the ia64 implementation in atomic.h is incorrect.  Otherwise the ia64 version of lowlevellock.h is incorrect.

(I think a cleaner way to do all of this is to explicitly provide atomic.h like operations which pair an atomic update with a memory barrier.  This makes the interfaces explicit, and avoids (potentially expensive) redundant barriers in the generated code.  I think the __lll_mutex_lock code currently contains a redundant barrier on ia64.  The atomic_ops.h file in http://www.hpl.hp.com/personal/Hans_Boehm/qprof/qprof-0.2.tar.gz takes that approach.)

I have been seeing strange, very intermittent, failures of my garbage collect test with nptl, which may be attributable to this.

I'm not sure what the optimal instruction sequence for atomic_exchange_with_release_barrier should look like.  My guess would be that it should be a load followed by a cmpxchg4.rel and a test.  David Mosberger's guess was an mf instruction (full barrier) followed by the current code.  Probably both options are similar in cost.

If someone tells me what the intended semantics of atomic_exchange are, I can make a first cut at the patch, though I would appreciate help with testing, since I don't currently have a glibc tree.

Thanks.

Hans


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]