This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA 1/2][master+8.3] (Windows) fix thr != nullptr assert failure in delete_thread_1


Hello,

We have observed that GDB would randomly trip the the following
assertion failure when debugging on Windows. When allowing
the program to run until the inferior exits, we occasionally see:

     (gdb) cont
     Continuing.
     [Thread 48192.0xd100 exited with code 1]
     [Thread 48192.0x10ad8 exited with code 1]
     [Thread 48192.0x36e28 exited with code 0]
     [Thread 48192.0x52be4 exited with code 0]
     [Thread 48192.0x5aa40 exited with code 0]
     ../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf
o*, bool): Assertion `thr != nullptr' failed.

Running the same scenario with some additional traces enabled...

    (gdb) set verbose
    (gdb) set debugevents

... allows us to understand what the issue is. To understand, we need
to first look at the events received when starting the program, and
in particular which threads got created how. First, we get a
CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8:

    gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT)

Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events,
one of them being for tid=0x4010c:

    gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT)
Fast forward a bit of debugging, and we do a "cont" as above,
at which point the programs reaches the end, and the system reports
"exit" events. The first interesting one is the following:

    gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT)

This is reporting a thread-exit event for a thread whose tid
is the TID of what we call the "main thread". That's the thread
that was created when we received the CREATE_PROCESS_DEBUG_EVENT
notification, and whose TID is actually stored in a global variable
called main_thread_id. This is not something we expected, as
the assumption we made was that the main thread would exit last,
and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT.
But apparently, this is not always true, at least on Windows Server
2012 and 2016 where this issue has been observed happening randomly.

The consequence of the above notification is that we call
windows_delete_thread for that thread, which removes it from
our list of known threads.

And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT,
and we can see that the associated tid is not the main_thread_id,
but rather the tid of one of the threads that was created during
the lifetime of the program, in this case tid=0x4010c:

    gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT)

And the debug trace printed right after shows why we're crashing:

    [Deleting Thread 317536.0x442a8]

We are trying to delete the thread whose tid=0x442a8, which is
the main_thread_id! As we have already deleted that thread before,
the search for it returns a nullptr, which then trips the assertion
check in delete_thread_1.

This commit fixes the immediate issue; but it still ignores the open
question of what to do with the main_thread_id global, particularly
after that thread has been removed from our list of threads. The bigger
question is kept separate from this patch, because this smaller patch
seems to work well enough on its own in practice, and make the patch
less complex, thus a possible candidate for backporting to a release
branch (thinking of the gdb-8.3-branch, in this case).

Another change that this commit makes is re-instate the thread create/
exit notifications for what we thought was the main thread. We added
some filtering recently to not show those notifications. Otherwise,
with what we've discovered, it is possible that the list of threads
reported by the "new thread" notifications does not match the list
of threads from the "thread exit" notifications.

gdb/ChangeLog:

	* windows-nat.c (windows_add_thread): Remove parameter "main_thread_p".
	Adjust function accordingly.  Update all callers.
	(windows_delete_thread): Likewise.
	(get_windows_debug_event) <EXIT_PROCESS_DEBUG_EVENT>: Use
        current_event.dwThreadId instead of main_thread_id.

Tested on both x86-windows and x86_64-windows using AdaCore's testsuite.
The portion of the patch that uses current_event.dwThreadId instead of
main_thread_id has undergood about a couple of weeks of nightly testing
on various Windows versions.

OK to master?
OK for 8.3?

Thanks!

---
 gdb/windows-nat.c | 58 ++++++++++++++++++++++++-------------------------------
 1 file changed, 25 insertions(+), 33 deletions(-)

diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index 5009418..039eb13 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -430,12 +430,10 @@ thread_rec (DWORD id, int get_context)
 
    PTID is the ptid of the thread to be added.
    H is its Windows handle.
-   TLB is its thread local base.
-   MAIN_THREAD_P should be true if the thread to be added is
-   the main thread, false otherwise.  */
+   TLB is its thread local base.  */
 
 static windows_thread_info *
-windows_add_thread (ptid_t ptid, HANDLE h, void *tlb, bool main_thread_p)
+windows_add_thread (ptid_t ptid, HANDLE h, void *tlb)
 {
   windows_thread_info *th;
   DWORD id;
@@ -456,13 +454,18 @@ windows_add_thread (ptid_t ptid, HANDLE h, void *tlb, bool main_thread_p)
 
   /* Add this new thread to the list of threads.
 
-     To be consistent with what's done on other platforms, we add
-     the main thread silently (in reality, this thread is really
-     more of a process to the user than a thread).  */
-  if (main_thread_p)
-    add_thread_silent (ptid);
-  else
-    add_thread (ptid);
+     On many platforms, we add the main program's thread silently.
+     But unfortunately, it does not seem to identify such thread
+     on Windows. We used to think that the main program's thread
+     is the thread reported by the CREATE_PROCESS_DEBUG_EVENT, but
+     we've seen since then that this thread sometimes exits before
+     other threads in the program -- when that happens, what we see
+     is that the CREATE_PROCESS_DEBUG_EVENT thread gets reported
+     as exited via an EXIT_THREAD_DEBUG_EVENT, and another thread
+     gets used in the EXIT_PROCESS_DEBUG_EVENT notification instead.
+
+     So, we just add all threads verbosely.  */
+  add_thread (ptid);
 
   /* Set the debug registers for the new thread if they are used.  */
   if (debug_registers_used)
@@ -503,12 +506,10 @@ windows_init_thread_list (void)
 /* Delete a thread from the list of threads.
 
    PTID is the ptid of the thread to be deleted.
-   EXIT_CODE is the thread's exit code.
-   MAIN_THREAD_P should be true if the thread to be deleted is
-   the main thread, false otherwise.  */
+   EXIT_CODE is the thread's exit code.  */
 
 static void
-windows_delete_thread (ptid_t ptid, DWORD exit_code, bool main_thread_p)
+windows_delete_thread (ptid_t ptid, DWORD exit_code)
 {
   windows_thread_info *th;
   DWORD id;
@@ -517,16 +518,11 @@ windows_delete_thread (ptid_t ptid, DWORD exit_code, bool main_thread_p)
 
   id = ptid.tid ();
 
-  /* Emit a notification about the thread being deleted.
-
-     Note that no notification was printed when the main thread
-     was created, and thus, unless in verbose mode, we should be
-     symetrical, and avoid that notification for the main thread
-     here as well.  */
+  /* Emit a notification about the thread being deleted.  */
 
   if (info_verbose)
     printf_unfiltered ("[Deleting %s]\n", target_pid_to_str (ptid).c_str ());
-  else if (print_thread_events && !main_thread_p)
+  else if (print_thread_events)
     printf_unfiltered (_("[%s exited with code %u]\n"),
 		       target_pid_to_str (ptid).c_str (),
 		       (unsigned) exit_code);
@@ -1411,8 +1407,7 @@ fake_create_process (void)
     = windows_add_thread (ptid_t (current_event.dwProcessId, 0,
 				  current_event.dwThreadId),
 			  current_event.u.CreateThread.hThread,
-			  current_event.u.CreateThread.lpThreadLocalBase,
-			  true /* main_thread_p */);
+			  current_event.u.CreateThread.lpThreadLocalBase);
   return main_thread_id;
 }
 
@@ -1584,8 +1579,7 @@ get_windows_debug_event (struct target_ops *ops,
       th = windows_add_thread
         (ptid_t (current_event.dwProcessId, 0, current_event.dwThreadId),
 	 current_event.u.CreateThread.hThread,
-	 current_event.u.CreateThread.lpThreadLocalBase,
-	 false /* main_thread_p */);
+	 current_event.u.CreateThread.lpThreadLocalBase);
 
       break;
 
@@ -1596,8 +1590,7 @@ get_windows_debug_event (struct target_ops *ops,
 		     "EXIT_THREAD_DEBUG_EVENT"));
       windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
 				     current_event.dwThreadId),
-			     current_event.u.ExitThread.dwExitCode,
-			     false /* main_thread_p */);
+			     current_event.u.ExitThread.dwExitCode);
       th = &dummy_thread_info;
       break;
 
@@ -1617,8 +1610,7 @@ get_windows_debug_event (struct target_ops *ops,
         (ptid_t (current_event.dwProcessId, 0,
 		 current_event.dwThreadId),
 	 current_event.u.CreateProcessInfo.hThread,
-	 current_event.u.CreateProcessInfo.lpThreadLocalBase,
-	 true /* main_thread_p */);
+	 current_event.u.CreateProcessInfo.lpThreadLocalBase);
       thread_id = current_event.dwThreadId;
       break;
 
@@ -1637,11 +1629,11 @@ get_windows_debug_event (struct target_ops *ops,
       else if (saw_create == 1)
 	{
 	  windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
-					 main_thread_id),
-				 0, true /* main_thread_p */);
+					 current_event.dwThreadId),
+				 0);
 	  ourstatus->kind = TARGET_WAITKIND_EXITED;
 	  ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
-	  thread_id = main_thread_id;
+	  thread_id = current_event.dwThreadId;
 	}
       break;
 
-- 
2.1.4


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]