This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Patch to correct the refrence count of fork handler in child process
- From: suzuki <suzuki at in dot ibm dot com>
- To: libc-alpha at sources dot redhat dot com
- Cc: sjmunroe at us dot ibm dot com, Sripathi Kodi <sripathik at in dot ibm dot com>, kodatisu at in dot ibm dot com
- Date: Fri, 23 Feb 2007 15:46:25 -0800
- Subject: Patch to correct the refrence count of fork handler in child process
Hi,
While debugging a hang in one of our applications I came across the
following bug in NPTL library.
The hang occurs in a child process spawn by a multi-threaded
application. The application has been linked to a library which has
registered "pthread_atfork()" handlers. The child processes hang
indefinitely at __unregister_atfork() while exiting. These hanging
processes have only one thread of execution, but they were waiting for
the fork handler count to become 0, from a number which was greater than 2.
eg:
[root@c108bc2b01 ~]# strace -p 16603
Process 16603 attached - interrupt to quit
futex(0x800730, FUTEX_WAIT, 8, NULL <unfinished ...>
Process 16603 detached
[root@c108bc2b01 ~]# strace -p 16605
Process 16605 attached - interrupt to quit
futex(0x800730, FUTEX_WAIT, 7, NULL <unfinished ...>
This happens with the following scenario :
When mulitple threads in a process call fork () (simultaneously), they
might end up in bumping up the fork handler ref. counts to a value > 2
(depending on the number of threads and the timings). The child process
forked thus would have an abnormal fork handler refcount which would
never drop back to 0 and thus wait indefinitely while doing an
__unregister_atfork().
We have verified the attached patch, where we reset the fork-handler
refcounts to 1 for the child process. It is safe to do that, since we
are the only thread in the process.
--- libc.orig/nptl/sysdeps/unix/sysv/linux/fork.c 2003-12-20
15:37:13.000000000 -0800
+++ libc/nptl/sysdeps/unix/sysv/linux/fork.c 2007-02-23
12:24:02.000000000 -0800
@@ -167,8 +167,12 @@
allp->handler->child_handler ();
/* Note that we do not have to wake any possible waiter.
- This is the only thread in the new process. */
- --allp->handler->refcntr;
+ This is the only thread in the new process.
+ The count may have been bumped up by other threads doing
+ a fork. We reset it to 1, to avoid waiting for non-existing
+ thread(s) to release the count.
+ */
+ allp->handler->refcntr = 1;
Comments are more than welcome !
If it looks correct, could you please push this upstream ?
Thanks,
Suzuki K P
Linux Technology Center,
IBM
nptl/sysdeps/unix/sysv/linux/fork.c : __libc_fork()
Reset the refcntr for the fork-handlers in the child process to 1.
The refcntr may have been bumped up by the *OTHER* thread(s) doing a fork() in the parent process. This would leave the child with an incorrect refcntr for the fork-handlers leading us to wait for non-existing threads to drop the refcntr while doing __unregister_atfork(). Since there is only one thread in the child process, we reset it to 1, to reflect the actual count.
Index: libc/nptl/sysdeps/unix/sysv/linux/fork.c
===================================================================
--- libc.orig/nptl/sysdeps/unix/sysv/linux/fork.c 2003-12-20 15:37:13.000000000 -0800
+++ libc/nptl/sysdeps/unix/sysv/linux/fork.c 2007-02-23 12:25:45.000000000 -0800
@@ -167,8 +167,12 @@
allp->handler->child_handler ();
/* Note that we do not have to wake any possible waiter.
- This is the only thread in the new process. */
- --allp->handler->refcntr;
+ This is the only thread in the new process.
+ The count may have been bumped up by the other threads in
+ the parent, doing a fork. We reset it to 1, to avoid waiting
+ for non-existing thread(s) to release the count.
+ */
+ allp->handler->refcntr = 1;
/* XXX We could at this point look through the object pool
and mark all objects not on the __fork_handlers list as