This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Lock elision implementation guidelines


On Fri, 2013-02-22 at 15:02 +0100, OndÅej BÃlka wrote:
> On Wed, Feb 20, 2013 at 09:42:56PM +0100, Torvald Riegel wrote:
> > On Wed, 2013-02-20 at 11:58 -0500, Rich Felker wrote:
> > > On Wed, Feb 20, 2013 at 12:21:35PM +0100, Torvald Riegel wrote: 
> > > > While it's true that the lock acquisition latency for locks that are
> > > > already in the cache has been substantially decreased in the past, once
> > > > you use the same locks concurrently with other threads, you get cache
> > > > misses, and these are costly.  Typically, when you use locks, you use
> > > > them because you need to synchronize and there can be concurrent actions
> > > > by other threads.  You can hope that those other threads are not too far
> > > > away from yourself in terms of the memory hierarchy (i.e., hope for
> > > > locality), but that's not simple.  But we won't get less concurrency.
> > > 
> > > First of all, 99% of users of glibc are using it for LAMP stacks or
> > > desktop-oriented Linux distributions. High-end computational
> > > parallelism is in itself a 1% niche. This is the basis for my claim
> > > that lock elision is a "pet feature".
> > 
> > We'll likely get even more parallelism in hardware in the future, so
> > likely also more parallelism in programs, and this usually results in
> > more synchronization too.  In fact, the "high-end computational
> > parallelism" programs (I assume this means well-optimized programs) will
> > likely often have less synchronization than the average parallel or
> > concurrent program, I believe.  So lock elision is the more important
> > the less optimized the synchronization in the program is, I think.
> > 
> When you optimize programs first thing you look for is what locks can be
> eliminated by using atomic primitives.

I suppose you mean the atomic operations provided by current hardware.
But that has its limits, obviously: Either because those are
single-location, so you can't use them widely, or because once you need
multi-memory-location atomicity, building custom concurrent code for
this quickly becomes much more complicated.

Hence people use abstractions such as locks.  Avoiding locks, or making
them more coarse-granular, doesn't necessarily improve performance.  In
essence, you have to use locking properly, and you have to use the right
kind of locks.  Same as with any other mechanism, be it single-location
atomic ops or transactions.

> Optimized programs would benefit 
> more from better optimized compare and swap.

I don't think current CAS is a problem; the best-case latency is already
pretty low compared to past implementations (on x86 at least).  But the
hardware implementation can't avoid the cache miss costs, so it's up to
programmers to synchronize in an efficient way.

Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]