This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: [RFC]: Ugly thread step situation
On Tue, Sep 14, 2004 at 05:44:47PM -0400, jjohnstn wrote:
> I recently tracked down a problem with gdb on RHEL3 Linux regarding
> stepping threads. What happens is that in some instances, lin-lwp.c is
> asked to step the thread of interest. We then wait on all threads. Due
> to some form of race condition, the wait does not get back the trap from
> the stepped thread. If we have a number of waiting events (e.g. thread
> create events, other breakpoints), lin-lwp picks one of them.
Could you explain this bit a little more? What comes back instead for
the thread that was stepping? Do we stop it with a SIGSTOP?
Is there a testcase?
> Now it gets interesting. Infrun.c thinks the current thread is being
> stepped and isn't ready for a breakpoint coming back. On x86, it makes a
> miscalculation of the pc value (for a breakpoint it should back up 1, for
> a step it doesn't have to). We end up pointing at an invalid pc (we
> didn't back up 1) and everything falls apart from there.
>
> To fix this quickly, I added the accompanying patch to lin-lwp.c. What it
> does is ensure that we wait on any currently stepping lwp. In truth, this
> isn't as bad as it sounds. The lin-lwp code later on is set up to pick
> the stepping lwp over all other events. This just keeps the scenario
> above from occurring.
>
> Obviously, this doesn't solve everything. Perhaps the decrement of the pc
> needs to be done once we have established whether the thread has changed
> underneath us. We also could use a hook to run the lwp list and find out
> if the current lwp was stepping or encountered a breakpoint.
>
> Anyway, if the consensus is that the patch is helpful in the short-term, I
> am more than happy to check it in.
>
> -- Jeff J.
>
> 2004-09-14 Jeff Johnston <jjohnstn@redhat.com>
>
> * lin-lwp.c (find_singlestep_lwp_callback): New static function.
> (lin_lwp_wait): Change code to specifically wait on any LWP
> that is currently stepping.
This sounds sort of like a problem I debugged on MIPS and hppa, but
never managed to reproduce. I had tabled the patch until I had more
time to look at it - always a mistake.
The same patch may help here. Could you tell me what resume_ptid is
before the call to target_resume, in resume? The call in which we
request the single-step, I mean.
--
Daniel Jacobowitz