This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: Possible regression on gdb.multi/multi-arch-exec.exp
- From: Pedro Alves <palves at redhat dot com>
- To: Sergio Durigan Junior <sergiodj at redhat dot com>
- Cc: gdb-patches at sourceware dot org
- Date: Thu, 28 Jun 2018 13:09:25 +0100
- Subject: Re: Possible regression on gdb.multi/multi-arch-exec.exp
- References: <20180607180704.3991-1-palves@redhat.com> <87in649jtd.fsf@redhat.com>
On 06/27/2018 07:16 PM, Sergio Durigan Junior wrote:
> On Thursday, June 07 2018, Pedro Alves wrote:
>
>> This is more preparation bits for multi-target support.
>
> Hi Pedro,
>
> While preparing a new Fedora GDB rawhide release, I noticed a regression
> related to this commit. The curious thing is that I am only able to
> reproduce the regression on a Fedora Rawhide system; it doesn't happen
> on my Fedora 27 machine (initially I thought it might be related to GCC,
> but testing against GCC HEAD on my Fedora 27 machine also did not
> trigger the regression).
>
> The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing:
>
> (gdb) break all_started
> Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42.
> (gdb) run
> Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff7476700 (LWP 1354)]
>
> Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
> 42 }
> (gdb) delete breakpoints
> Delete all breakpoints? (y or n) y
> (gdb) info breakpoints
> No breakpoints or watchpoints.
> (gdb) break main
> Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51.
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))]
> #0 all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
> 42 }
> (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1
> set follow-exec-mode new
> (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new
> continue
> Continuing.
> [Thread 0x7ffff7476700 (LWP 1354) exited]
> process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello
> [New inferior 2 (process 0)]
> [New process 1350]
> ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)
>
>
> I spent some time investigating this, and here's what I've learned so
> far:
>
> 1) When infrun.c:handle_inferior_event_1 is called and deals with
> TARGET_WAITKIND_EXECD (around line 5275), it does:
>
> ...
> case TARGET_WAITKIND_EXECD:
> if (debug_infrun)
> fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n");
>
> /* Note we can't read registers yet (the stop_pc), because we
> don't yet know the inferior's post-exec architecture.
> 'stop_pc' is explicitly read below instead. */
> switch_to_thread_no_regs (ecs->event_thread);
>
> /* Do whatever is necessary to the parent branch of the vfork. */
> handle_vfork_child_exec_or_exit (1);
>
> /* This causes the eventpoints and symbol table to be reset.
> Must do this now, before trying to determine whether to
> stop. */
> follow_exec (inferior_ptid, ecs->ws.value.execd_pathname); // <---- #1
>
> stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2
> ...
>
> 2) When follow_exec is called (#1 above), it does:
>
> ...
> /* The target reports the exec event to the main thread, even if
> some other thread does the exec, and even if the main thread was
> stopped or already gone. We may still have non-leader threads of
> the process on our list. E.g., on targets that don't have thread
> exit events (like remote); or on native Linux in non-stop mode if
> there were only two threads in the inferior and the non-leader
> one is the one that execs (and nothing forces an update of the
> thread list up to here). When debugging remotely, it's best to
> avoid extra traffic, when possible, so avoid syncing the thread
> list with the target, and instead go ahead and delete all threads
> of the process but one that reported the event. Note this must
> be done before calling update_breakpoints_after_exec, as
> otherwise clearing the threads' resources would reference stale
> thread breakpoints -- it may have been one of these threads that
> stepped across the exec. We could just clear their stepping
> states, but as long as we're iterating, might as well delete
> them. Deleting them now rather than at the next user-visible
> stop provides a nicer sequence of events for user and MI
> notifications. */
> ALL_THREADS_SAFE (th, tmp)
> if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
> delete_thread (th);
> ...
>
> On my Fedora Rawhide box, delete_thread is being called to delete the
> same thread as ecs->event_thread. On my Fedora 27 machine, it deletes a
> different thread.
>
> 3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread
> points to an invalid object, which triggers the assertion.
>
>
> I haven't progressed much further (other things to wrap up), but I
> decided to get the ball rolling already. If you need access to a Fedora
> Rawhide VM, please let me know and I can provide this to you.
I think the "gdb: Eliminate the 'stop_pc' global" patch
(<https://sourceware.org/ml/gdb-patches/2018-06/msg00524.html>)
will fix this, because it moves the stop_pc assignment until
after ecs->event_thread is refreshed:
> @@ -5289,16 +5294,18 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
> stop. */
> follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);
>
> - stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));
> -
> /* In follow_exec we may have deleted the original thread and
> created a new one. Make sure that the event thread is the
> execd thread for that case (this is a nop otherwise). */
> ecs->event_thread = inferior_thread ();
>
> + ecs->event_thread->suspend.stop_pc
> + = regcache_read_pc (get_thread_regcache (ecs->event_thread));
> +
Thanks,
Pedro Alves