This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] GDBserver crashes when killing a multi-thread process


On 07/13/2015 05:07 PM, Yao Qi wrote:

> Hi Pedro,
> do you still remember why did you add this assert?  It wasn't
> mentioned in the mail 
> https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html
> 

Simply because getting here was supposed to indicate
something went wrong elsewhere, but at the time I didn't consider
that the child could die while ptrace-stopped.

> I am looking at a GDBserver internal error on x86_64 when I run
> gdb.threads/thread-unwindonsignal.exp with GDBserver,
> 
> continue^M
> Continuing.^M
> warning: Remote failure reply: E.No unwaited-for children left.^M
> PC register is not available^M
> (gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit
> Remote debugging from host 127.0.0.1^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> monitor exit^M
> Killing process(es): 30694^M
> (gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A 
> problem internal to GDBserver has been detected.^M
> kill_wait_lwp: Assertion `res > 0' failed.
> 
> After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html

> GDBserver starts to swallows errors if the LWP is gone.  Then, when
> GDBservers kills non-exist LWP, the assert will be triggered.
> 

Looks like I forgot to push the rest of that series:

 https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html

What do you think of that one?

> Why don't we implement kill_wait_lwp like its counterpart in GDB
> linux-nat.c:kill_wait_callback? we can loop and assert like this
> patch below, (note that this patch fixes the internal error, and
> the FAIL is still there).
> 

Seems to me it's not 100% correct to waitpid the pid one more time
after we've already reaped it, because there's a minuscule chance
another process that we're debugging could clone a new lwp that reuses
the PID of the one we've just killed/reaped, and then another iteration
could collect the initial SIGSTOP of the wrong LWP and we'd kill it:

-> kill (pid1, SIGKILL);
<- waitpid (pid1) returns pid1/WSIGNALLED
-> on iteration1: new pid1 clone lwp is spawned
-> ret==pid1, continue iterating
-> kill (pid1, SIGKILL); // killing wrong process
<- waitpid (pid1) returns either SIGSTOP or WSIGNALLED
...

Thanks,
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]