This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Multi-threaded single-step vs. breakpoint problems (prepare_to_proceed)


Hello,

we've been seeing a number of problems relating to the
interaction of stepping and breakpoints in a multi-
threaded environment.

The following simple example test case shows a number of
those issues; it has two threads that execute completely
independent of each other (and a blocked main thread):

#include <pthread.h>

int global_a;

void thread1_sub (int a)
{
  global_a = a;
}

void *thread1 (void *x)
{
  int i;
  for (i = 0; ; i++)
    thread1_sub (i);
}

void thread2_sub (int a)
{
  global_a = a;
}

void *thread2 (void *x)
{
  int i;
  for (i = 0; ; i++)
    thread2_sub (i);
}

int main (void)
{
  pthread_t t1, t2;

  pthread_create (&t1, NULL, thread1, NULL);
  pthread_create (&t2, NULL, thread2, NULL);

  pthread_join (t1, NULL);
  pthread_join (t2, NULL);
}


The main problem is related to switching to a different thread
at the user-interface level, and then attempting to step.

(gdb) break thread2_sub
Breakpoint 1 at 0x8048427: file bug.c, line 20.
(gdb) run
Starting program: /home/uweigand/fsf/thread-bug/bug
[Thread debugging using libthread_db enabled]
[New Thread 0xb7fe76c0 (LWP 2592)]
[New Thread 0xb7fe6bb0 (LWP 2595)]
[New Thread 0xb75e5bb0 (LWP 2596)]
[Switching to Thread 0xb75e5bb0 (LWP 2596)]

Breakpoint 1, thread2_sub (a=0) at bug.c:20
20        global_a = a;
(gdb) info threads
* 3 Thread 0xb75e5bb0 (LWP 2596)  thread2_sub (a=0) at bug.c:20
  2 Thread 0xb7fe6bb0 (LWP 2595)  0x0804841a in thread1 (x=0x0) at bug.c:15
  1 Thread 0xb7fe76c0 (LWP 2592)  0x008437a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) thread 2
[Switching to thread 2 (Thread 0xb7fe6bb0 (LWP 2595))]#0  0x0804841a in thread1 (x=0x0) at bug.c:15
15          thread1_sub (i);
(gdb) s
thread2_sub (a=0) at bug.c:21
21      }
(gdb) info thread
* 3 Thread 0xb75e5bb0 (LWP 2596)  thread2_sub (a=0) at bug.c:21
  2 Thread 0xb7fe6bb0 (LWP 2595)  0x080483fe in thread1_sub (a=717244) at bug.c:8
  1 Thread 0xb7fe76c0 (LWP 2592)  0x008437a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2

Note how even though we wanted to continue by stepping thread 2,
we're actually back in thread 3.  (Note that this is not caused by
hitting the break point a second time, it happens even if the
breakpoint is removed before the step.  The same also happens if
we don't have any breakpoint, but just interrupt with Ctrl-C.)


Trying to track down these issues, I've run into problems
understanding some (quite old) infrun code; I'd appreciate
any insights from those who know the history better than I do.

The problem is definitely caused by what appears to be deliberate
code in prepare_to_proceed.  The comment at the place the function
is called says:

  /* In a multi-threaded task we may select another thread
     and then continue or step.

     But if the old thread was stopped at a breakpoint, it
     will immediately cause another breakpoint stop without
     any execution (i.e. it will report a breakpoint hit
     incorrectly).  So we must step over it first.

     prepare_to_proceed checks the current thread against the thread
     that reported the most recent event.  If a step-over is required
     it returns TRUE and sets the current thread to the old thread. */

which is true as far as it goes.  However, are are a number of
problems in this approach:

  /* Get the last target status returned by target_wait().  */
  get_last_target_status (&wait_ptid, &wait_status);

  /* Make sure we were stopped either at a breakpoint, or because
     of a Ctrl-C.  */
  if (wait_status.kind != TARGET_WAITKIND_STOPPED
      || (wait_status.value.sig != TARGET_SIGNAL_TRAP
          && wait_status.value.sig != TARGET_SIGNAL_INT))
    {
      return 0;
    }
> Well, *why* check for Ctrl-C here?  This doesn't appear to
> have anything to do with the caller's intent.

  if (!ptid_equal (wait_ptid, minus_one_ptid)
      && !ptid_equal (inferior_ptid, wait_ptid))
    {
      /* Switched over from WAIT_PID.  */
      CORE_ADDR wait_pc = read_pc_pid (wait_ptid);

      if (wait_pc != read_pc ())
> This check compares whether the newly selected thread is
> stopped at the same location as the thread that stopped.
> What is the significance of that?
        {
          /* Switch back to WAIT_PID thread.  */
          inferior_ptid = wait_ptid;
> At this point it switches back unconditionally, even if there
> actually is no breakpoint at the current location at all (e.g.
> because it was one-shot or has been removed).  This function
> will then return 0, but the thread was switched anyway ...

          /* FIXME: This stuff came from switch_to_thread() in
             thread.c (which should probably be a public function).  */
          reinit_frame_cache ();
          registers_changed ();
          stop_pc = wait_pc;
> Also, it doesn context_switch here.  No idea if that can cause
> problems later on ...
        }

      /* We return 1 to indicate that there is a breakpoint here,
         so we need to step over it before continuing to avoid
         hitting it straight away. */
      if (breakpoint_here_p (wait_pc))
        return 1;
> Just an insignificant detail: the first thing the caller does
> after prepare_to_proceed return 1 is to repeat that same check.
    }

  return 0;


However, the most glaring problem appears to be: even if this
function were fixed so as to switch threads only when really
necessary to solve the step-over-breakpoint problem, someone
should really switch *back* to the thread the user is actually
interested in after the stepping-over is done.  I cannot see
any point in infrun code where this is even attempted ...


Looking for the history of this piece of code, the last
significant change (except for some multi-arch rework and
merging multiple implementations into one) appears to be
this patch that adds the Ctrl-C handling I was wondering
about above:
http://sourceware.org/ml/gdb-patches/2001-05/msg00419.html

Unfortunately the patch doesn't provide any rationale why
switching back to another thread is a good thing to do
here, and the only substantial follow-up doesn't help me
much further either:
"I see what you are trying to do, and I think it is a good idea."
http://sourceware.org/ml/gdb-patches/2001-05/msg00432.html


Now, as far as I can see, the expected user behaviour in
this case would be to just continue stepping the thread
the user selected.

This would imply:
- prepare_to_proceed never switches threads just due to Ctrl-C
- prepare_to_proceed only switched threads if the previous
  thread would indeed run immediately into a breakpoint
- if prepare_to_proceed does switch threads, the next time the
  target stops we switch back to the thread the user selected
  and continue stepping there

Note that the only part of infrun that attempts to do anything
like that third action is the stepping_past_singlestep_breakpoint
logic; it switches back to "saved_singlestep_ptid".  I think
something along those lines needs to be done for the stepping-
past-breakpoint-in-other-thread case as well.


Before I start working on a fix along those lines, I'd appreciate
any feedback.  Am I overlooking something here?

Thanks,
Ulrich



-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]