This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Lock elision implementation guidelines


> Focus on PNT, allow E
> 
>     Enable LE conservatively. Provide tuning knobs that are not part of the
>     ABI.
>     Pro: Hopefully little harm done by LE implementation deficiencies (need
>     to be conservative with semantics, but performance tuning doesn't need to
>     perfect right away).
>     Con: No stable way to tune LE until a later time. 


The attached program suffers from a massive performance loss
caused by lock elision.  It runs two threads; the first one has a
repeated, big workload, while the second one just increases a
counter repeatedly.  Two mutexes are used.

(The test was run on a zEC12 with the elision patches ported to z
architecture.)

thread 1:
---------

for (...)
  lock A
  lock B
  increment a counter 1
  unlock A
  handle big workload
  increment counter 3
  unlock B

thread 2:
---------

forever
  lock A
  increment a counter 2
  lock B

(For completeness, there might be a third thread that runs at very
low frequency, and uses lock B to query counter 3.  That thread
might not need to be very responsive.  I.e. lock B is not
superfluous.)

In a real life test, the runtime of the program with elision is
almost identical to the runtime without elision and just locks.
But in the same time I get 1247 million increments of counter 2
without elision but just 660 million increments _with_ elision.

The explanation for this is that because of lock collisions the
mutex A is eventually switched to using a real lock instead of an
elided lock.  Thread 1 then elides lock B.  From this point I'm
not entirely sure what happens as I'm new to the nptl code.  But
either thread 2 waits on lock A until thread 1 releases lock B, or
as thread 2 tries to lock A, it aborts the transaction in thread
1.

The point is:  For certain program logic, lock elision can be
very bad for performance.

P.S.:  Test run with default tuning values; I think it switches
to real locks after three failed transactions, and back to
transactions after three real locks.  I'm quite sure that given
_any_ set of tuning parameters, I can write you a test program
that has similarly bad performance using elision.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
/* ---------------------------- included header files ---------------------- */

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

/* ---------------------------- local definitions -------------------------- */

#define NUM_PASSES 100000
#define NUM_PASSES_2 100000

/* ---------------------------- local macros ------------------------------- */

/* ---------------------------- imports ------------------------------------ */

/* ---------------------------- included code files ------------------------ */

/* ---------------------------- local types -------------------------------- */

__attribute__ ((aligned(256))) pthread_mutex_t m1 = PTHREAD_MUTEX_INITIALIZER;
__attribute__ ((aligned(256))) pthread_mutex_t m2 = PTHREAD_MUTEX_INITIALIZER;
pthread_barrier_t barrier;

long c1 = 0;
long c1b = 0;
long c2 = 0;
int t1_done = 0;
volatile long j;

/* ---------------------------- forward declarations ----------------------- */

/* ---------------------------- local variables ---------------------------- */

/* ---------------------------- exported variables (globals) --------------- */

/* ---------------------------- local functions ---------------------------- */

static void *_thread_do_work_1(void *ptr)
{
	long i;

	pthread_barrier_wait(&barrier);
	for (i = 0; i < NUM_PASSES; )
	{
		pthread_mutex_lock(&m1);
		pthread_mutex_lock(&m2);
		c1++;
		pthread_mutex_unlock(&m1);
		c1b++;
		i++;
		{
			for (j = 0; j < NUM_PASSES_2; )
			{
				j++;
			}
		}
		pthread_mutex_unlock(&m2);
	}
	t1_done = 1;
	pthread_barrier_wait(&barrier);
	pthread_exit((void *)0);
}

static void *_thread_do_work_2(void *ptr)
{
	pthread_barrier_wait(&barrier);
	while (t1_done == 0)
	{
		pthread_mutex_lock(&m1);
		c2++;
		pthread_mutex_unlock(&m1);
	}
	pthread_barrier_wait(&barrier);
	pthread_exit((void *)0);
}

/* ---------------------------- interface functions ------------------------ */

int main(int argc, char **argv)
{
	int rc;

	rc = pthread_barrier_init(&barrier, NULL, 2);
	if (rc != 0)
	{
		perror("pthread_barrier_init");
		exit(rc);
	}
	{
		pthread_t tid[2];
		int i;

		pthread_create(&tid[0], NULL, _thread_do_work_1, NULL);
		pthread_create(&tid[1], NULL, _thread_do_work_2, NULL);
		for (i = 0; i < 2; i++)
		{
			void *retval;

			pthread_join(tid[i], &retval);
		}
	}
	printf("c1 %ld/%ld, c2 %ld, j %ld\n", c1, c1b, c2, j);

	return 0;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]