This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Problem with zombie processes


On Mon, Feb 20, 2017 at 11:54 AM, Mark Geisert wrote:
>> So my guess was that Cygwin might try to hold on to a handle to a
>> child process at least until it's been explicitly wait()ed.  But that
>> does not seem to be the case after all.
>
>
> You might have missed a subtlety in what I said above.  The Python
> interpreter itself is calling wait4() to reap your child process.  Cygwin
> has told Python one of its children has died.  You won't get the chance to
> wait() for it yourself.  Cygwin *does* have a handle to the process, but it
> gets closed as part of Python calling wait4().

To be clear, wait4() is not called from Python until the script
explicitly calls p.wait().
In other words, when run this step by step (e.g. in gdb) I don't see a
wait4() call until the point where the script explicitly waits().  I
don't see any reason Python would do this behind the scenes.

>> Anyways, I think it would be nicer if /proc returned at least partial
>> information on zombie processes, rather than an error.  I have a patch
>> to this effect for /proc/<pid>/stat, and will add a few more as well.
>> To me /proc/<pid>/stat was the most important because that's the
>> easiest way to check the process's state in the first place!  Now I
>> also have to catch EINVAL as well and assume that means a zombie
>> process.
>
>
> The file /proc/<pid>/stat is there until Cygwin finishes cleanup of the
> child due to Python having wait()ed for it.  When you run your test script,
> pay attention to the process state character in those cases where you
> successfully read the stat file.  It's often S (stopped, I think) or R
> (running) but I also see Z (zombie) sometimes.  Your script is in a race
> with Cygwin, and you cannot guarantee you'll see a killed process's state
> before Cygwin cleans it up.
>
> One way around this *might* be to install a SIGCHLD handler in your Python
> script.  If that's possible, that should tell you when your child exits.

Perhaps the Python script is a red herring.  I just wrote it to
demonstrate the problem.  The difference between where I send stdout
to is strange, but you're likely right that it just comes down to
subtle timing differences.  Here's a C program that demonstrates the
same issue more reliably.  Interestingly, it works when I run it in
strace (probably just because of the strace overhead) but not when I
run it normally.

My point in all this is I'm confused why Cygwin would give up its
handles to the Windows process before wait() has been called.

(In fact, it's pretty confusing to have fopen returning EINVAL which
according to [1] it should only be doing if the mode string were
invalid.)

Thanks,
Erik

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <signal.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/errno.h>


int do_parent(pid_t);
void do_child(int);


int main(void) {
    int devnul;
    pid_t pid;

    devnul = open("/dev/null", O_WRONLY);
    pid = fork();
    if (pid) {
        /* Parent */
        return do_parent(pid);
    } else {
        /* Child */
        do_child(devnul);
    }
}


int do_parent(pid_t child_pid) {
    FILE *f;
    char buf[120];
    char filename[20];

    printf("child pid: %d\n", child_pid);
    sleep(5);
    printf("sending SIGKILL\n");
    kill(child_pid, SIGKILL);
    sprintf(filename, "/proc/%d/stat", child_pid);
    printf("reading %s\n", filename);
    f = fopen(filename, "r");
    if (f == NULL) {
        printf("fopen error [%d]: %s\n", errno, strerror(errno));
        if (!access(filename, R_OK)) {
            printf("but the file exists and is readable\n");
        }
    } else {
        fread(buf, sizeof(char), 120, f);
        printf(buf);
    }
    return wait4(child_pid, NULL, 0, NULL);
}


void do_child(int out) {
    char *argv[1];

    argv[0] = "/usr/bin/yes";
    dup2(out, 1);
    execv(argv[0], argv);
    exit(0);
}
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]