This is the mail archive of the libc-hacker@sourceware.cygnus.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: New pthreads library


> Using the pointer [for pthread_t] is only possible if there is a fixed
> number of descriptors available and they are all in a preallocated array.

Do you say this based on the presumption that one must be able to easily
tell whether a given pointer is a valid thread ID or not?  

> Just like the old implementations of stdio.  Since I assume that for the
> Hurd there will be no limits you cannot use pointers for the reasons you
> cited: some function must be able to recognize invalid thread handles.

1003.1-1996 is not entirely consistent on the requirements for this--or at
least the specification is quite subtle.  It is subtle enough that Thomas
and I previously thought parts of it might have been unintentional, and
certainly subtle enough that I think it deserves full exposition of the
precise constraints before concluding on the implementation ramifications.

pthread_kill, pthread_detach, and pthread_join are required to return ESRCH
if the condition occurs that the thread ID is invalid.

The other calls where ESRCH is mentioned (and relates to a thread ID rather
than a process ID) are specified to return ESRCH if the condition is detected.
I believe this weaker specification allows those calls to crash when given
an invalid thread ID.

The threads introduction (section 16.1 p 333) says:

	A conforming implementation is free to reuse a thread ID after the
	thread terminates if it was created with the `detachstate'
	attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or
	pthread_join() has been called for that thread.  If a thread is
	detached, its thread ID is invalid for use as an argument in a call
	to pthread_detach() or pthread_join().

A relevant note in the specification of pthread_kill (3.3.10.2 p 92):

	As in kill(), if SIG is zero, error checking is performed but no
	signal is actually sent.

I take this to indicate the intent that pthread_kill can be used as a
thread liveness check (as kill can be used as a PID liveness check).

I believe those are all the directly relevant citations from the standard.

I think it is safe to presume from general principles that picking a
pthread_t value from thin air (i.e. not the result of any prior
pthread_create call in the same process) has undefined behavior, or at
least unspecified behavior.  (Section 16.1 explicitly specifies using a
pthread_t gotten from another process to be unspecified.)  As a practical
matter, an implementation probably needs to detect and reject zero (which
any application's uninitialized pthread_t variables are likely to be set
to--though such an application is buggy by the spec).  So aside from that,
the issue is thread IDs of terminated threads.  Let's examine the cases.

The thread ID of a terminated, undetached, unjoined thread is still valid.
Its data structure sits around until it is detached or joined.

The thread ID of a terminated, detached thread is invalid but subject to
reuse.  Likewise the thread ID of a terminated, joined thread.
pthread_kill, pthread_detach, and pthread_join are required to return ESRCH
if the condition occurs that "No thread could be found corresponding to
that specified by the given thread ID."  So these are allowed either to
succeed by operating on a new thread that has reused the terminated
thread's ID, or to fail with ESRCH.

Personally, I think pthread_kill is the only one of these where there is a
good argument for requiring robust detection of thread IDs for terminated,
detached threads.  But those three are what's specified, and anyway just
one call is enough to require arranging things to make robust detection
possible.

I see only one way to meet the spec while using a data structure pointer
for pthread_t.  That is never to free the data structures, only reuse them
after the thread has terminated and been joined or detached.  This has the
obvious cost of consuming (presumably mostly untouched) virtual memory
proportional to the maximum number of threads the process has ever had, no
matter how few threads remain live.

> I really would suggest to go with an index based thread handle and use
> the indirection.  Maybe even with some kind of generation number.

That seems reasonable enough to me.  The main downside to this is the need
to resize the indirection table as the number of threads grows.

> > Thanks for explaining what THREAD_GETMEM is for.  I didn't quite get
> > why this `ugflication' of the code was necessary.  Of course I never
> > looked at the SPARC code.  I will certainly convert my code to do
> > this, but I may postpone this until the code has stabilized a bit.

I don't understand why the uglifying macro is necessary at all.  With a
global register variable in scope and an inline `thread_self' function that
just returns its value, the compiler does the right thing for:

	struct thread_internal *self = internal_thread_self ();
	blah = self->foobar;

i.e., it just optimizes out the SELF variable completely and replaces it
with the global register. 

For the x86 segment register trick, you can do a very similar thing just
with another special declaration.  Either:

	extern struct thread_internal *_self asm("%gs:0");
	inline struct thread_internal *thread_self() { return _self; }

or:

	extern struct thread_internal _self asm("%gs:0");
	inline struct thread_internal *thread_self() { return &_self; }

depending on the exact flavor of segment register trick you're using.
> > A while back, when you were hinting at a rewrite of the threads
> > library, you talked about a paper describing the new Mach 3.0 cthreads
> > library.  I belive the paper you were talking about is `Randall
> > W. Dean, Using Continuations to Build a User-Level Threads Library'.
> > I've read the paper and it seems like a good approach to me.
> 
> This is one approach but it's not the only way to do this.

There are many designs that have been researched.  Most of the things we
have discussed here in the past grow out of the "scheduler activations"
concept.  The Utah OSDI'99 paper (see
http://www.cs.utah.edu/projects/flux/papers/index.html) discusses issues
somewhat related to this, and its bibliography contains some citations
about scheduler activations and related work (including the Mach
implementation of scheduler activations, I believe).

> What this does mean is that there must be two kinds of thread
> descriptors: kernel and user.  The API only exposes the user threads
> but the library must know about both.  

All of this is internal implementation structure that will be completely
hidden from the user API, and almost completely hidden from the user ABI.
There are only a few things in the user API that may be important to inline
for speed such that the user ABI might depend on any thread data structure
at all; these are thread-specific data and pthread_self, and probably no
others.  Only a small data structure (that can be a prefix of the full
internal thread descriptor) containing the information these inlined calls
need becomes a part of the user ABI.  The full data structure used to
describe threads can change and evolve for implementation convenience,
without affecting the ABI at all.

Since this structure layout is purely internal source-level decision, the
"kernel thread part" can be either a separate structure or just a portion
of the full thread data structure, according to the convenience of the
implementation.  For a particular implementation/configuration that is
always 1:1, there is no reason not to include everything in a single data
structure allocated in one chunk.

> And ideally the only machine dependent thing is the kernel thread part.

Hmm.  On the contrary, I would think that the OS-specific kernel interface
part would be mostly machine-independent (since the OS takes care of
abstracting the state it maintains).  Conversely, some implementation
models do preemptive thread context switching purely in user mode (e.g. the
scheduler activations model); so the "user part" is then responsible for
the highly machine-dependent details of saving a preempted user thread's
state and restoring previously saved context.

What I would like to see is the machine-dependent but OS-independent parts
well isolated so we can reuse (at least parts of) the machine-dependent
thread switching code in different OS implementations.  Something akin to
the Solaris getcontext/setcontext interface might be nice.

> I'd like to see already at this point a kernel_thread_t separate from
> the user pthread_t.  The pthread_t are directly created by the
> pthread_create calls, the kernel_thread_t objects are created as an
> reaction to a request to run a pthread_t.  I.e., for now always a
> kernel thread is created.  Then this can be changed later very easy.

Again, I think this is something that should not be fixed for all versions
of the implementation.  The division of the abstractions into conceptually
distinct data structures is certainly good.  But it can easily be a
decision made at source level via sysdeps typedefs/macros whether to
include a "kernel thread" data structure directly in the pthread data
structure or to use a pointer to a separately allocated structure.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]