This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: posix/tst-waitid.c possible race condition on Linux


Roland,

your patch is very good.

I especially like the distinction of the different instances of
CHECK_SIGCHLD. I have lost a lot of time examining the first 'stopped'
instance while thinking it was the one failing.

I can confirm empirically and analytically that your patch do fix the
issue.

For the curious, I went up to linux kernel/signal.c and kernel/exit.c to
understand the race.

wait4() syscall does the following:

1. Scan children process descriptors to find signal->flags ==
SIGNAL_STOP_STOPPED
2. If none found, wait in a queue and sleep.

When SIGSTOP is delivered, the child process execute do_signal_stop()
and:

1. Set signal->flags = SIGNAL_STOP_STOPPED
2. Notify parent by generating SIGCHLD and notify waiting parent in a
waitqueue.

These 2 steps are generally executed together but if waitpid() is
executed between #1 and #2 in the parent, the test will not block inside
waitpid() and miss the signal.

I have improved your patch by using sigtimedwait() and removing a couple
of sleep() calls. I'm just gonna read my e-mails in case Rich has been
faster than me. If not, I'll post my improvement over your proposal.

Greetings,
Olivier


On Wed, 2013-06-12 at 17:03 -0700, Roland McGrath wrote:
> I think the test was racy.  A change that will probably fix it is on the
> branch roland/tst-waitid and the patch is below, but I haven't been able to
> test it yet.
> 
> 
> Thanks,
> Roland
> 
> 
> 2013-06-12  Roland McGrath  <roland@hack.frob.com>
> 
> 	* posix/tst-waitid.c (do_test): Distinguish different instances of
> 	stopped/continued in CHECK_SIGCHLD uses.  Insert a delay between
> 	sending SIGSTOP and calling waitpid to ensure SIGCHLD gets delivered
> 	before entering the kernel for waitpid.
> 
> --- a/posix/tst-waitid.c
> +++ b/posix/tst-waitid.c
> @@ -145,7 +145,7 @@ do_test (int argc, char *argv[])
>    /* Give the child a chance to stop.  */
>    sleep (3);
>  
> -  CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
> +  CHECK_SIGCHLD ("stopped (before waitid)", CLD_STOPPED, SIGSTOP);
>  
>    /* Now try a wait that should not succeed.  */
>    siginfo_t info;
> @@ -227,7 +227,7 @@ do_test (int argc, char *argv[])
>        expecting_sigchld = 0;
>      }
>    else
> -    CHECK_SIGCHLD ("continued", CLD_CONTINUED, SIGCONT);
> +    CHECK_SIGCHLD ("continued (before waitid)", CLD_CONTINUED, SIGCONT);
>  
>    info.si_signo = 0;		/* A successful call sets it to SIGCHLD.  */
>    info.si_pid = -1;
> @@ -336,6 +336,13 @@ do_test (int argc, char *argv[])
>        printf ("kill (%d, SIGSTOP): %m\n", pid);
>        RETURN (EXIT_FAILURE);
>      }
> +
> +  /* Give the child a chance to stop.  The waitpid call below will block
> +     until it has stopped, but if we are real quick and enter the waitpid
> +     system call before the SIGCHLD has been generated, then it will be
> +     discarded and never delivered.  */
> +  sleep (3);
> +
>    pid_t wpid = waitpid (pid, &fail, WUNTRACED);
>    if (wpid < 0)
>      {
> @@ -354,7 +361,7 @@ do_test (int argc, char *argv[])
>        printf ("waitpid WUNTRACED on stopped: status %x\n", fail);
>        RETURN (EXIT_FAILURE);
>      }
> -  CHECK_SIGCHLD ("stopped", CLD_STOPPED, SIGSTOP);
> +  CHECK_SIGCHLD ("stopped (after waitpid)", CLD_STOPPED, SIGSTOP);
>  
>    expecting_sigchld = 1;
>    if (kill (pid, SIGCONT) != 0)
> @@ -372,7 +379,7 @@ do_test (int argc, char *argv[])
>        expecting_sigchld = 0;
>      }
>    else
> -    CHECK_SIGCHLD ("continued", CLD_CONTINUED, SIGCONT);
> +    CHECK_SIGCHLD ("continued (before waitpid)", CLD_CONTINUED, SIGCONT);
>  
>    wpid = waitpid (pid, &fail, WCONTINUED);
>    if (wpid < 0)
> 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]