This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.


Ping ?
The patch discussed here is already a first improvement
(which makes running tests under valgrind more comfortable).
Some more factorisation might be done with the 4 other
places where children are killed, but these seems more risky.

So, maybe this patch could be pushed as a first low risk
step towards the good direction ?

Thanks

Philippe


On Sun, 2018-11-11 at 18:24 +0100, Philippe Waroquiers wrote:
> On Sat, 2018-11-10 at 18:29 +0000, Pedro Alves wrote:
> > There's some long and ugly history around PTRACE_KILL vs SIGKILL.
> > See gdb/linux-nat.c kill_wait_one_lwp and kill_one_lwp, and the comments
> > within those functions.  I don't know whether the kernels those were for
> > are relevant any more.  Likely using git blame and finding the corresponding
> > patch posts would shed some more light.  Whatever we do, it'd be nice to
> > make gdb/linux-nat.c use the unified code as well.
> 
> I have done some digging in the source code and searched some
> more info using git blame.
> => I found 7 different linux logics related to kill(SIGKILL)
> and ptrace(PTRACE_KILL).
> I found only one identification of kernels giving problems.
> ChangeLogs/commit msg were not really giving much
> background information : these are telling what is being changed,
> not why it is being changed.
> linux-low.c has the most interesting comments explaining
> the kill related problems.
> 
> Below is some updated text with the found info (sorry, you
> will have to re-read some parts already in the RFA).
> 
> I have to admit that a 'linux-ptrace.c only factorization'
> as suggested in the RFA (at least as a first step, till we have
> validated with it that a simpler kill logic is ok on various platforms)
> is somewhat less frightening me than reducing all 7 different
> logics to a single one.
> (there is a table at the end of this mail that summarises all
> 7 logics).
> 
> I must say that all this is killing me :).
> Maybe I would feel better if I would just have added my own
> local new 8th kill logic where I need it :).
> 
> Philippe
> 
> 
> nat/linux-ptrace.c has 3 different logics to kill a child process.
> So, this patch factorizes killing a child in the function kill_child.
> 
> The 3 different logics are:
> * linux_ptrace_test_ret_to_nx is calling both kill (child, SIGKILL)
>   and ptrace (PTRACE_KILL, child, ...), and then is calling once
>   waitpid.
> * linux_check_ptrace_features is calling ptrace (PTRACE_KILL, child, ...)
>   + my_waitpid in a loop, as long as the waitpid status was WIFSTOPPED.
> * linux_test_for_tracefork is calling once ptrace (PTRACE_KILL, child, ...)
>   + my_waitpid.
> 
> The linux ptrace documentation indicates that PTRACE_KILL is deprecated,
> and tells to not use it, as it might return success but not kill the tracee.
> The documentation indicates to send SIGKILL directly.
> 
> I suspect that linux_ptrace_test_ret_to_nx calls both kill and ptrace just
> to be sure ...
> git blame indicates that the call to kill(SIGKILL) was added in addition to the
> pre-existing ptrace PTRACE_KILL. The commit log also indicates to ignore
> PTRACE_KILL errors. But there is no explanation why adding the SIGKILL call
> was needed, and why PTRACE_KILL was kept.
> 
> I suspect that linux_check_ptrace_features calls ptrace in a loop
> to bypass the PTRACE_KILL limitation.
> git blame indicates that this function was heavily based on
> linux-nat.c:linux_test_for_tracefork (which was moved to linux-ptrace.c,
> see next paragraph). I found no log/comments about the killing subtilities.
> 
> linux_test_for_tracefork seems to not handle the PTRACE_KILL
> limitation/has not been updated like linux_check_ptrace_features
> (which has a loop).
> 
> Also, 2 of the 3 logics are calling my_waitpid, which seems better,
> as this is protecting the waitpid syscall against EINTR.
> 
> We also find a bunch of other kill logics in gdbserver/linux-low.c and
> linux-nat.c and linux-fork.c.
> 
> gdbserver/linux-low.c has linux_kill_one_lwp, that has an extensive
> comment explaining why it uses both PTRACE_KILLME and SIGKILL
> Note however that the gdbserver code does the SIGKILL using
> 'syscall (__NR_tkill', not using 'kill (... SIGKILL).
> gdbserver/linux-low.c does a loop calling both PTRACE_KILLME and SIGKILL,
> but has a comment telling that the loop is most likely unnecessary.
> 
> Also, gdbservers/linux-low.c handles the case of waiting for "clone" children,
> indicating that __WALL depends on SIGCHLD, and killing a stopped process
> does not generate this signal.
> It also indicates it supports old kernels not providing clone(CLONE_THREAD),
> by sending a SIGKILL for each thread (not only one for the process).
> There is also a comment telling that only PTRACE_KILL was used for years,
> and that paranoia tells to do both SIGKILL and PTRACE_KILL, in case
> PTRACE_KILL might work better on old kernels (the comment says it is dubious
> there is such an old kernel, but explains that this is the paranoia).
> linux-low.c also has a special logic to kill the thread that has its
> lwpid == pid, because killing the parent before the children can
> cause zombies on kernels at least 2.6.0-test7 ... 2.6.8-rc4.
> 
> linux-nat.c first uses SIGKILL, and then PTRACE_KILL. A comment (from 2011)
> is telling that PTRACE_KILL may resume the inferior (so the need for SIGKILL).
> Another comment (also 2011) tells that some kernels ignore even SIGKILL
> for processes under ptrace (so the reason for keeping the PTRACE_KILL).
> Note that linux-nat.c also loops (like linux-low.c), but indicates
> the loop is needed as the linux kernel sometimes fails to kill a thread
> after PTRACE_KILL.
> To the contrary of linux-low.c, linux-nat.c uses waitpid (__WALL) rather
> than do 2 successive waitpid calls like linux-low.c.
> linux-nat.c also stops all threads before killing them, with a comment
> explaining that this is needed to have PTRACE_KILL working.
> 
> linux-fork.c has also its own duplicated kill logics.
> One uses (only) SIGKILL (with a comment telling this always works,
> contrary to PTRACE_KILL).
> The other logic (related to killing checkpoints) only uses PTRACE_KILL.
> This logic does not call waitpid, instead it does an inferior call
> to have waitpid called by an inferior (no idea why, must be something
> special related to checkpoint implementation).
> 
> Below is a table summarising what has been found.
> loop&stopcond is '-' if the logic at 'where' does not loop, otherwise
> it details when the loop continues. SIGKILL and PTRACE_KILL column contains
> a YES if the logic uses them, otherwise '-' indicates
> absence of call. A 'V' indicates the return status is verified,
> otherwise errors are ignored. TK indicates SIGKILL is sent with __NR_tkill,
> not with the usual kill(SIGKILL).
> 
> waitpid gives some details about how the waitpid is done:
>   'MY' indicates my_waitpid is called (to protect against EINTR),
>   otherwise, it is a direct call to waitpid.
>   Note that the return status of waitpid is not systematically checked,
>   or is not checked the same way or errors are not necessarily reported.
>   Flags are 0, or the specifically used flags. 2 set of flags separated
>   by a , means there are 2 calls to waitpid.
>   
> 
> where                        loop&cond        SIGKILL PTRACE_KILL waitpid
> --------------------         -------------    ------- ----------- -------
> linux-ptrace.c
>  linux_ptrace_test_ret_to_nx -                YES     YES         0
>  linux_check_ptrace_features WIFSTOPPED       -       YES V       MY 0
>  linux_test_for_tracefork    -                -       YES V       MY 0
>  
> linux-low.c                  WIFSTOPPED       YES TK  YES         MY 0,__WCLONE
> 
> linux-nat.c                  waitpid == pid   YES TK  YES         MY __WALL
> 
> linux-fork.c
>  linux_fork_killall          WIFSTOPPED       YES     -           0
>  delete_checkpoint_command   -                -       YES V       (infcall) 0
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]