This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCHv2] gdb: Don't drop SIGSTOP during stop_all_threads
On 06/14/2018 08:46 PM, Andrew Burgess wrote:
> This is an update based on Pedro's feedback:
>
Thank you.
> So, in this version the preload library is a little more complex. I
> now try to emulate the kernels sending of SIGCHLD after a short period
> of time in order to make it appear like the child thread status really
> has just arrived from the kernel later than it really did. This works
> better than I had hoped, with this change in place not only does the
> test now pass on gdbserver, but with the preload library I can (in a
> very simple test) use GDB as normal.
Awesome.
I found a few more small nits, but this is OK with those fixed.
> +#
> +# GDB would then trigger a call to stop_all_threads which would
> +# continue to wait for all of the outstanding threads to stop, when
> +# the outstanding stop events finally arrived GDB would then
> +# (incorrectly) discard the stop event, resume the thread, and
> +# continue to wait for the thread to stop.... which it now never
> +# would.
> +#
> +# In order to try and expose this issue reliably, this test preloads a
> +# library that intercepts waitpid calls. All waitpid calls targeting
> +# pid -1 with the WNOHANG flag are rate limited so that only 1 per
> +# second can complete. Additional calls are forced to return 0
> +# indicating no event waiting. This is enough to trigger the bug
> +# during the attach phase, however, it breaks the rest of GDB's
> +# behaviour, so we can't do more than attach with this library in
> +# place.
Does this last sentence ("breaks the rest of GDB") need updating?
> + If the kernel is slow in either delivering the signal, or making the
> + result available to the waitpid call then GDB will enter a sigsuspend
> + call in order to wait for the inferior threads to change state, this is
> + signalled to GDB with a SIGCHLD.
> +
> + A bug in GDB meant that in some cases we would deadlock during this
> + process. This was rarely seen as the kernel is usually quick at
> + delivering signals and making the results available to waitpid, so quick
> + that GDB would gather the statuses from all inferior threads in the
> + original pass.
> +
> + The idea in this library is to rate limit calls to waitpid (where pid is
> + -1 and the WNOHANG option is set) so that only 1 per second can return
> + an answer. Any additional calls will report that no threads are
> + currently ready. This should match the behaviour we see on a slow
> + kernel.
> +
> + However, given that usually when using this library, the kernel does
> + have the waitpid result ready this means that the kernel will never send
> + GDB a SIGCHLD. This means that when GDB enters sigsuspend it will block
> + forever. Alternatively, if GDB enters it's polling loop the lack of
s/is's/its/
> + SIGCHLD means that we will never see an event on the child threads. To
> + resolve these problems the library intercepts calls to sigsuspend and
> + forces the call to exit if there is a pending waitpid result. Also,
> + when we know that there's a waitpid result that we've ignored, we create
> + a new thread which, after a short delay, will send GDB a SIGCHLD. */
> +
> +
> +/* Cache the result of a waitpid call that has not been reported back to
> + GDB yet. We only ever cache a single result. Once we have a result
> + cached then later calls to waitpid with the WNOHANG option will return a
> + result of 0. */
> +
> +struct
> +{
> + /* Flag to indicate when we have a result cached. */
> + int cached_p;
> +
> + /* The cached result fields from a waitpid call. */
> + pid_t pid;
> + int wstatus;
> +} cached_wait_status;
"static" to avoid interposition issues.
> +
> +/* Lock to hold when modifying SIGNAL_THREAD_ACTIVE_P. */
> +
> +pthread_mutex_t thread_creation_lock_obj = PTHREAD_MUTEX_INITIALIZER;
Ditto.
> +#define thread_creation_lock (&thread_creation_lock_obj)
> +
> +/* This flag is only modified while holding the THREAD_CREATION_LOCK mutex.
> + When this flag is true then there is a signal thread alive that will be
> + sending a SIGCHLD at some point in the future. */
> +
> +int signal_thread_active_p;
Ditto.
> +
> +/* When we last allowed a waitpid to complete. */
> +
> +static struct timeval last_waitpid_time = { 0, 0 };
> +
> +/* The number of seconds that must elapse between calls to waitpid where
> + the pid is -1 and the WNOHANG option is set. If calls occur faster than
> + this then we force a result of 0 to be returned from waitpid. */
> +
> +#define WAITPID_MIN_TIME (1)
> +
> +/* Return true (non-zero) if we should skip this call to waitpid, or false
> + (zero) if this waitpid call should be handled with a call to the "real"
> + waitpid library. Allows 1 waitpid call per second. */
s/waitpid library/waitpid function/ ?
> +
> +static int
> +should_skip_waitpid (void)
> +{
> + struct timeval *tv = &last_waitpid_time;
> + if (tv->tv_sec == 0)
> + {
> + if (gettimeofday (tv, NULL) < 0)
> + error ("error: gettimeofday failed\n");
> + return 0; /* Don't skip. */
> + }
> + else
> + {
> + struct timeval new_tv;
> +
> + if (gettimeofday (&new_tv, NULL) < 0)
> + error ("error: gettimeofday failed\n");
> +
> + if ((new_tv.tv_sec - tv->tv_sec) < WAITPID_MIN_TIME)
> + return 1; /* Skip. */
> +
> + *tv = new_tv;
> + }
> +
> + /* Don't skip. */
> + return 0;
> +}
> +
> +/* Perform a real waitpid call. */
> +
> +static pid_t
> +real_waitpid (pid_t pid, int *wstatus, int options)
> +{
> + typedef pid_t (*fptr_t) (pid_t, int *, int);
> + static fptr_t real_func = NULL;
> +
> + if (real_func == NULL)
> + {
> + real_func = dlsym (RTLD_NEXT, "waitpid");
> + if (real_func == NULL)
> + error ("error: failed to find real waitpid\n");
> + }
> +
> + return (*real_func) (pid, wstatus, options);
> +}
> +
> +/* Thread worked created when we cached a waitpid result. Delays for
s/worked/worker/
s/cached/cache/
> + a short period of time and then sends SIGCHLD to the GDB process. This
> + should trigger GDB to call waitpid again, at which point we will make
> + the cached waitpid result available. */
> +
> +static void*
> +send_sigchld_thread (void *arg)
> +{
> + /* Delay one second longer than WAITPID_MIN_TIME so that there can be no
> + chance that a call to SHOULD_SKIP_WAITPID will return true one the
s/one/once/
Thanks,
Pedro Alves