This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: posix/tst-waitid.c possible race condition on Linux
- From: Olivier Langlois <olivier at trillion01 dot com>
- To: Roland McGrath <roland at hack dot frob dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Thu, 13 Jun 2013 22:59:35 -0400
- Subject: Re: posix/tst-waitid.c possible race condition on Linux
- References: <1371011970 dot 749 dot 30 dot camel at Wailaba2> <20130613000330 dot 2D1732C07F at topped-with-meat dot com>
Roland,
your patch is very good.
I especially like the distinction of the different instances of
CHECK_SIGCHLD. I have lost a lot of time examining the first 'stopped'
instance while thinking it was the one failing.
I can confirm empirically and analytically that your patch do fix the
issue.
For the curious, I went up to linux kernel/signal.c and kernel/exit.c to
understand the race.
wait4() syscall does the following:
1. Scan children process descriptors to find signal->flags ==
SIGNAL_STOP_STOPPED
2. If none found, wait in a queue and sleep.
When SIGSTOP is delivered, the child process execute do_signal_stop()
and:
1. Set signal->flags = SIGNAL_STOP_STOPPED
2. Notify parent by generating SIGCHLD and notify waiting parent in a
waitqueue.
These 2 steps are generally executed together but if waitpid() is
executed between #1 and #2 in the parent, the test will not block inside
waitpid() and miss the signal.
I have improved your patch by using sigtimedwait() and removing a couple
of sleep() calls. I'm just gonna read my e-mails in case Rich has been
faster than me. If not, I'll post my improvement over your proposal.
Greetings,
Olivier
On Wed, 2013-06-12 at 17:03 -0700, Roland McGrath wrote:
> I think the test was racy. A change that will probably fix it is on the
> branch roland/tst-waitid and the patch is below, but I haven't been able to
> test it yet.
>
>
> Thanks,
> Roland
>
>
> 2013-06-12 Roland McGrath <roland@hack.frob.com>
>
> * posix/tst-waitid.c (do_test): Distinguish different instances of
> stopped/continued in CHECK_SIGCHLD uses. Insert a delay between
> sending SIGSTOP and calling waitpid to ensure SIGCHLD gets delivered
> before entering the kernel for waitpid.
>
> --- a/posix/tst-waitid.c
> +++ b/posix/tst-waitid.c
> @@ -145,7 +145,7 @@ do_test (int argc, char *argv[])
> /* Give the child a chance to stop. */
> sleep (3);
>
> - CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
> + CHECK_SIGCHLD ("stopped (before waitid)", CLD_STOPPED, SIGSTOP);
>
> /* Now try a wait that should not succeed. */
> siginfo_t info;
> @@ -227,7 +227,7 @@ do_test (int argc, char *argv[])
> expecting_sigchld = 0;
> }
> else
> - CHECK_SIGCHLD ("continued", CLD_CONTINUED, SIGCONT);
> + CHECK_SIGCHLD ("continued (before waitid)", CLD_CONTINUED, SIGCONT);
>
> info.si_signo = 0; /* A successful call sets it to SIGCHLD. */
> info.si_pid = -1;
> @@ -336,6 +336,13 @@ do_test (int argc, char *argv[])
> printf ("kill (%d, SIGSTOP): %m\n", pid);
> RETURN (EXIT_FAILURE);
> }
> +
> + /* Give the child a chance to stop. The waitpid call below will block
> + until it has stopped, but if we are real quick and enter the waitpid
> + system call before the SIGCHLD has been generated, then it will be
> + discarded and never delivered. */
> + sleep (3);
> +
> pid_t wpid = waitpid (pid, &fail, WUNTRACED);
> if (wpid < 0)
> {
> @@ -354,7 +361,7 @@ do_test (int argc, char *argv[])
> printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
> RETURN (EXIT_FAILURE);
> }
> - CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
> + CHECK_SIGCHLD ("stopped (after waitpid)", CLD_STOPPED, SIGSTOP);
>
> expecting_sigchld = 1;
> if (kill (pid, SIGCONT) != 0)
> @@ -372,7 +379,7 @@ do_test (int argc, char *argv[])
> expecting_sigchld = 0;
> }
> else
> - CHECK_SIGCHLD ("continued", CLD_CONTINUED, SIGCONT);
> + CHECK_SIGCHLD ("continued (before waitpid)", CLD_CONTINUED, SIGCONT);
>
> wpid = waitpid (pid, &fail, WCONTINUED);
> if (wpid < 0)
>