This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Possible regression on gdb.multi/multi-arch-exec.exp

From: Pedro Alves <palves at redhat dot com>
To: Sergio Durigan Junior <sergiodj at redhat dot com>
Cc: gdb-patches at sourceware dot org
Date: Thu, 28 Jun 2018 13:09:25 +0100
Subject: Re: Possible regression on gdb.multi/multi-arch-exec.exp
References: <20180607180704.3991-1-palves@redhat.com> <87in649jtd.fsf@redhat.com>

On 06/27/2018 07:16 PM, Sergio Durigan Junior wrote:
> On Thursday, June 07 2018, Pedro Alves wrote:
> 
>> This is more preparation bits for multi-target support.
> 
> Hi Pedro,
> 
> While preparing a new Fedora GDB rawhide release, I noticed a regression
> related to this commit.  The curious thing is that I am only able to
> reproduce the regression on a Fedora Rawhide system; it doesn't happen
> on my Fedora 27 machine (initially I thought it might be related to GCC,
> but testing against GCC HEAD on my Fedora 27 machine also did not
> trigger the regression).
> 
> The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing:
> 
>   (gdb) break all_started
>   Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42.
>   (gdb) run 
>   Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec 
>   [Thread debugging using libthread_db enabled]
>   Using host libthread_db library "/lib64/libthread_db.so.1".
>   [New Thread 0x7ffff7476700 (LWP 1354)]
> 
>   Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
>   42      }
>   (gdb) delete breakpoints
>   Delete all breakpoints? (y or n) y
>   (gdb) info breakpoints
>   No breakpoints or watchpoints.
>   (gdb) break main
>   Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51.
>   (gdb) thread 1
>   [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))]
>   #0  all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42
>   42      }
>   (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1
>   set follow-exec-mode new
>   (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new
>   continue
>   Continuing.
>   [Thread 0x7ffff7476700 (LWP 1354) exited]
>   process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello
>   [New inferior 2 (process 0)]
>   [New process 1350]
>   ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed.
>   A problem internal to GDB has been detected,
>   further debugging may prove unreliable.
>   Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error)
> 
> 
> I spent some time investigating this, and here's what I've learned so
> far:
> 
> 1) When infrun.c:handle_inferior_event_1 is called and deals with
> TARGET_WAITKIND_EXECD (around line 5275), it does:
> 
>     ...
>     case TARGET_WAITKIND_EXECD:
>       if (debug_infrun)
>         fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n");
> 
>       /* Note we can't read registers yet (the stop_pc), because we
> 	 don't yet know the inferior's post-exec architecture.
> 	 'stop_pc' is explicitly read below instead.  */
>       switch_to_thread_no_regs (ecs->event_thread);
> 
>       /* Do whatever is necessary to the parent branch of the vfork.  */
>       handle_vfork_child_exec_or_exit (1);
> 
>       /* This causes the eventpoints and symbol table to be reset.
>          Must do this now, before trying to determine whether to
>          stop.  */
>       follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);   // <---- #1
> 
>       stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2
>       ...
> 
> 2) When follow_exec is called (#1 above), it does:
> 
>   ...
>   /* The target reports the exec event to the main thread, even if
>      some other thread does the exec, and even if the main thread was
>      stopped or already gone.  We may still have non-leader threads of
>      the process on our list.  E.g., on targets that don't have thread
>      exit events (like remote); or on native Linux in non-stop mode if
>      there were only two threads in the inferior and the non-leader
>      one is the one that execs (and nothing forces an update of the
>      thread list up to here).  When debugging remotely, it's best to
>      avoid extra traffic, when possible, so avoid syncing the thread
>      list with the target, and instead go ahead and delete all threads
>      of the process but one that reported the event.  Note this must
>      be done before calling update_breakpoints_after_exec, as
>      otherwise clearing the threads' resources would reference stale
>      thread breakpoints -- it may have been one of these threads that
>      stepped across the exec.  We could just clear their stepping
>      states, but as long as we're iterating, might as well delete
>      them.  Deleting them now rather than at the next user-visible
>      stop provides a nicer sequence of events for user and MI
>      notifications.  */
>   ALL_THREADS_SAFE (th, tmp)
>     if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid))
>       delete_thread (th);
>   ...
> 
> On my Fedora Rawhide box, delete_thread is being called to delete the
> same thread as ecs->event_thread.  On my Fedora 27 machine, it deletes a
> different thread.
> 
> 3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread
> points to an invalid object, which triggers the assertion.
> 
> 
> I haven't progressed much further (other things to wrap up), but I
> decided to get the ball rolling already.  If you need access to a Fedora
> Rawhide VM, please let me know and I can provide this to you.

I think the "gdb: Eliminate the 'stop_pc' global" patch
(<https://sourceware.org/ml/gdb-patches/2018-06/msg00524.html>)
will fix this, because it moves the stop_pc assignment until
after ecs->event_thread is refreshed:

> @@ -5289,16 +5294,18 @@ Cannot fill $_exitsignal with the correct signal number.\n"));
>           stop.  */
>        follow_exec (inferior_ptid, ecs->ws.value.execd_pathname);
>  
> -      stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread));
> -
>        /* In follow_exec we may have deleted the original thread and
>  	 created a new one.  Make sure that the event thread is the
>  	 execd thread for that case (this is a nop otherwise).  */
>        ecs->event_thread = inferior_thread ();
>  
> +      ecs->event_thread->suspend.stop_pc
> +	= regcache_read_pc (get_thread_regcache (ecs->event_thread));
> +

Thanks,
Pedro Alves

Follow-Ups:
- [pushed] Fix follow-exec regression / crash (Re: Possible regression on gdb.multi/multi-arch-exec.exp)
  - From: Pedro Alves

References:
- [PATCH] Use thread_info and inferior pointers more throughout
  - From: Pedro Alves
- Possible regression on gdb.multi/multi-arch-exec.exp (was: Re: [PATCH] Use thread_info and inferior pointers more throughout)
  - From: Sergio Durigan Junior

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]