This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [libc-alpha] Race condition between nss/dlopen/thread

From: Kaz Kylheku <kaz at ashi dot footprints dot net>
To: "H . J . Lu" <hjl at lucon dot org>
Cc: GNU C Library <libc-alpha at sources dot redhat dot com>
Date: Wed, 13 Mar 2002 13:19:14 -0800 (PST)
Subject: Re: [libc-alpha] Race condition between nss/dlopen/thread

On Wed, 13 Mar 2002, H . J . Lu wrote:

> Date: Wed, 13 Mar 2002 12:51:08 -0800
> From: H . J . Lu <hjl@lucon.org>
> To: GNU C Library <libc-alpha@sources.redhat.com>
> Subject: [libc-alpha] Race condition between nss/dlopen/thread
> 
> There is a race condition between nss/dlopen/thread. The problem is
> 
> 1. Thread A has called dlopen, and the global static initialiser inside
> the shared object that's being dlopened is trying to call getservbyname
> It's holding the dlopen lock, and trying to acquire the NSS lock.
> 2. Thread B also calls getservbyname, so we're holding the NSS lock 
> (which thread A is trying to acquire), and NSS is trying to dlopen a
> shared library. 
> 
> Now. We're deadlocked, because we're trying to acquire the dlopen lock,
> which is held by thread A. This is obviously quite sensitive to timing.
> I am enclosing a testcase here.
> 
> Since NSS also does dlopen, I think it should share locks with dlopen.

I've resolved countless deadlocks like this over my little career.
Consolidating unrelated locks isn't the only answer for cases like this
where two execution paths acquire locks in opposite orders. It might
not be desirable to conflate NSS and dlopen together just because of
some concurrency concern.

One solution is prevention. Simply do not acquire both locks, only one
at a time. For instance, NSS could arrange to have its lock released
while it calls out to dlopen. Likewise, dlopen could perhaps arrange to
release its lock when calling out to the library's initialization routine.

A design and coding policy of giving up local locks when calling out
into a foreign subsystem works very well in avoiding deadlocks, and it
is reasonably easy to inspect code for breaches of that policy.

I've applied this approach successfully in the implementation of a fairly
sophsticated proprietary communication protocol stack.  In this software,
as you might expect, there are several functional logical layers. Thread
can traverse the layers in either direction, and there is shared data
at each level. Moreover, threads can call down, and then back up again
via callbacks.  So such software abounds with opportunity for deadlocks
due to recursion or opposite lock acquisition orders. It became clear
early in the implementation that every layer will have to release its
lock or locks when calling up or down.

Of course, unlocking means that your shared state has to be sane prior to
making the call, and you have to reexamine that state when you reacquire
that lock. If there is the danger that someone may destroy an object, you
may have to use some tricks, like reference counting, to carry the object
from one critical region to the next. For example, this is done in the
timer_settime() function (linuxthreads/sysdeps/pthread/timer_settime.c),
which potentially races against timer_delete() due to briefly giving up
the mutex.

Follow-Ups:
- Re: [libc-alpha] Race condition between nss/dlopen/thread
  - From: Ulrich Drepper

References:
- Race condition between nss/dlopen/thread
  - From: H . J . Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]