This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 3/3] Fix GDB crashes with remote async/non-stop.


I see occasional GDB crashes in e.g., the mi-nonstop-exit.exp test,
when using the remote target, caused by GDB referencing an already
free'd structure.  This happens right after the remote connection is
closed.  This is now exposed by -lmcheck.  We just got "lucky" and
usually not crash without it.  Valgrind shows:

Breakpoint 1, main () at ../../../src/gdb/testsuite/gdb.mi/non-stop-exit.c:42
42        for (i = 0; i < NTHREADS; ++i)
(gdb) c
Continuing.
[Inferior 1 (process 25123) exited normally]
(gdb) ==25115== Invalid read of size 8
==25115==    at 0x6B592E: serial_is_async_p (serial.c:543)
==25115==    by 0x488C7F: reschedule (ser-base.c:69)
==25115==    by 0x488EBD: fd_event (ser-base.c:162)
==25115==    by 0x5D7401: handle_file_event (event-loop.c:827)
==25115==    by 0x5D68C7: process_event (event-loop.c:401)
==25115==    by 0x5D698E: gdb_do_one_event (event-loop.c:465)
==25115==    by 0x5D69DF: start_event_loop (event-loop.c:490)
==25115==    by 0x5D848A: cli_command_loop (event-top.c:182)
==25115==    by 0x5CF355: current_interp_command_loop (interps.c:321)
==25115==    by 0x5CFD15: captured_command_loop (main.c:226)
==25115==    by 0x5CE37E: catch_errors (exceptions.c:546)
==25115==    by 0x5D10E6: captured_main (main.c:1001)
==25115==  Address 0xbe86f68 is 8 bytes inside a block of size 8,296 free'd
==25115==    at 0x4A0662E: free (vg_replace_malloc.c:366)
==25115==    by 0x6FABC8: xfree (common-utils.c:107)
==25115==    by 0x6B539C: do_serial_close (serial.c:365)
==25115==    by 0x6B53C1: serial_close (serial.c:371)
==25115==    by 0x4A8A4D: remote_close (remote.c:3007)
==25115==    by 0x5FD685: target_close (target.c:3752)
==25115==    by 0x5F8EC3: unpush_target (target.c:1042)
==25115==    by 0x4B0F23: remote_mourn_1 (remote.c:7595)
==25115==    by 0x4B0F09: remote_mourn (remote.c:7588)
==25115==    by 0x5FBF15: target_mourn_inferior (target.c:2796)
==25115==    by 0x5B9CFA: handle_inferior_event (infrun.c:3422)
==25115==    by 0x5B89F3: fetch_inferior_event (infrun.c:2815)
==25115==
==25115== Invalid read of size 8
==25115==    at 0x6B5942: serial_is_async_p (serial.c:543)
==25115==    by 0x488C7F: reschedule (ser-base.c:69)
==25115==    by 0x488EBD: fd_event (ser-base.c:162)
==25115==    by 0x5D7401: handle_file_event (event-loop.c:827)
==25115==    by 0x5D68C7: process_event (event-loop.c:401)
==25115==    by 0x5D698E: gdb_do_one_event (event-loop.c:465)
==25115==    by 0x5D69DF: start_event_loop (event-loop.c:490)
==25115==    by 0x5D848A: cli_command_loop (event-top.c:182)
==25115==    by 0x5CF355: current_interp_command_loop (interps.c:321)
==25115==    by 0x5CFD15: captured_command_loop (main.c:226)
==25115==    by 0x5CE37E: catch_errors (exceptions.c:546)
==25115==    by 0x5D10E6: captured_main (main.c:1001)
==25115==  Address 0xbe88fc0 is 8,288 bytes inside a block of size 8,296 free'd
==25115==    at 0x4A0662E: free (vg_replace_malloc.c:366)
==25115==    by 0x6FABC8: xfree (common-utils.c:107)
==25115==    by 0x6B539C: do_serial_close (serial.c:365)
==25115==    by 0x6B53C1: serial_close (serial.c:371)
==25115==    by 0x4A8A4D: remote_close (remote.c:3007)
==25115==    by 0x5FD685: target_close (target.c:3752)
==25115==    by 0x5F8EC3: unpush_target (target.c:1042)
==25115==    by 0x4B0F23: remote_mourn_1 (remote.c:7595)
==25115==    by 0x4B0F09: remote_mourn (remote.c:7588)
==25115==    by 0x5FBF15: target_mourn_inferior (target.c:2796)
==25115==    by 0x5B9CFA: handle_inferior_event (infrun.c:3422)
==25115==    by 0x5B89F3: fetch_inferior_event (infrun.c:2815)
==25115==

This triggers ERRORS (caused by GDB crashes) in the non-stop tests,
which use async mode, and triggers much more often when running the
whole testsuite in async mode.  The problem is

(gdb) bt
#0  serial_close (scb=0xe78e70) at ../../src/gdb/serial.c:371
#1  0x00000000004a8a4e in remote_close (quitting=0) at ../../src/gdb/remote.c:3007
#2  0x00000000005fd686 in target_close (targ=0xc20380, quitting=0) at ../../src/gdb/target.c:3752
#3  0x00000000005f8ec4 in unpush_target (t=0xc20380) at ../../src/gdb/target.c:1042
#4  0x00000000004b0f24 in remote_mourn_1 (target=0xc20380) at ../../src/gdb/remote.c:7595
#5  0x00000000004b0f0a in remote_mourn (ops=0xc20380) at ../../src/gdb/remote.c:7588
#6  0x00000000005fbf16 in target_mourn_inferior () at ../../src/gdb/target.c:2796
#7  0x00000000005b9cfb in handle_inferior_event (ecs=0x7fffffffd600) at ../../src/gdb/infrun.c:3422
#8  0x00000000005b89f4 in fetch_inferior_event (client_data=0x0) at ../../src/gdb/infrun.c:2815
#9  0x00000000005d99c6 in fetch_inferior_event_wrapper (client_data=0x0) at ../../src/gdb/inf-loop.c:145
#10 0x00000000005ce37f in catch_errors (func=0x5d99ae <fetch_inferior_event_wrapper>, func_args=0x0, errstring=0x877e98 "", mask=6) at ../../src/gdb/exceptions.c:546
#11 0x00000000005d9734 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at ../../src/gdb/inf-loop.c:53
#12 0x00000000004b8652 in remote_async_serial_handler (scb=0xe78e70, context=0x0) at ../../src/gdb/remote.c:11081
#13 0x0000000000488eb2 in fd_event (error=0, context=0xe78e70) at ../../src/gdb/ser-base.c:161

that the serial is closed and freed from within the serial's
async_handler callback, called from fd_event (frame #13), but after
calling the handler, in fd_event still uses serial pointer:

  scb->async_handler (scb, scb->async_context);
  reschedule (scb);

The reschedule needs to run after the handler.  So the easiest way to
fix this I could think of is to add reference counting to the serial
object, so we can control when it is released, and avoid the
reschedule if it is already closed.

That's what the patch below does.  Tested x86_64 Fedora 16, gdbserver
sync and async modes.

The old reference counting kept the serial open until the reference
count reached 0.  Now, a serial_close always closes the serial, while
the serial object is kept live (in C terms) until the reference count
reaches 0.  IOW the "close" operation is decoupled from the object's
release.

2012-06-08  Pedro Alves  <palves@redhat.com>

	* ser-base.c (run_async_handler_and_reschedule): New.
	(fd_event, push_event): Use it.
	* serial.c (serial_open, serial_fdopen_ops): Set the initial
	reference count to 1.
	(do_serial_close): Set the bufp field to NULL.  Use serial_unref
	instead of xfree.
	(serial_is_open, serial_ref, serial_unref): New.
	* serial.h (serial_open): Adjust comment.
	(serial_is_open): Declare.
	(serial_close): Adjust comment.
	(serial_ref, serial_unref) Declare.
	(struct serial): New field 'refcnt'.
---
 gdb/ser-base.c |   30 +++++++++++++++++++++++++-----
 gdb/serial.c   |   27 ++++++++++++++++++++++++++-
 gdb/serial.h   |   27 ++++++++++++++++++++++-----
 3 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/gdb/ser-base.c b/gdb/ser-base.c
index b4399d0..2f12dfc 100644
--- a/gdb/ser-base.c
+++ b/gdb/ser-base.c
@@ -123,6 +123,29 @@ reschedule (struct serial *scb)
     }
 }
 
+/* Run the SCB's async handle, and reschedule, if the handler doesn't
+   close SCB.  */
+
+static void
+run_async_handler_and_reschedule (struct serial *scb)
+{
+  int is_open;
+
+  /* Take a reference, so a serial_close call within the handler
+     doesn't make SCB a dangling pointer.  */
+  serial_ref (scb);
+
+  /* Run the handler.  */
+  scb->async_handler (scb, scb->async_context);
+
+  is_open = serial_is_open (scb);
+  serial_unref (scb);
+
+  /* Get ready for more, if not already closed.  */
+  if (is_open)
+    reschedule (scb);
+}
+
 /* FD_EVENT: This is scheduled when the input FIFO is empty (and there
    is no pending error).  As soon as data arrives, it is read into the
    input FIFO and the client notified.  The client should then drain
@@ -158,8 +181,7 @@ fd_event (int error, void *context)
 	  scb->bufcnt = SERIAL_ERROR;
 	}
     }
-  scb->async_handler (scb, scb->async_context);
-  reschedule (scb);
+  run_async_handler_and_reschedule (scb);
 }
 
 /* PUSH_EVENT: The input FIFO is non-empty (or there is a pending
@@ -173,9 +195,7 @@ push_event (void *context)
   struct serial *scb = context;
 
   scb->async_state = NOTHING_SCHEDULED; /* Timers are one-off */
-  scb->async_handler (scb, scb->async_context);
-  /* re-schedule */
-  reschedule (scb);
+  run_async_handler_and_reschedule (scb);
 }
 
 /* Wait for input on scb, with timeout seconds.  Returns 0 on success,
diff --git a/gdb/serial.c b/gdb/serial.c
index c1f331d..1140feb 100644
--- a/gdb/serial.c
+++ b/gdb/serial.c
@@ -196,6 +196,7 @@ serial_open (const char *name)
   scb->bufcnt = 0;
   scb->bufp = scb->buf;
   scb->error_fd = -1;
+  scb->refcnt = 1;
 
   /* `...->open (...)' would get expanded by the open(2) syscall macro.  */
   if ((*scb->ops->open) (scb, open_name))
@@ -245,6 +246,7 @@ serial_fdopen_ops (const int fd, struct serial_ops *ops)
   scb->bufcnt = 0;
   scb->bufp = scb->buf;
   scb->error_fd = -1;
+  scb->refcnt = 1;
 
   scb->name = NULL;
   scb->debug_p = 0;
@@ -291,7 +293,10 @@ do_serial_close (struct serial *scb, int really_close)
   if (scb->name)
     xfree (scb->name);
 
-  xfree (scb);
+  /* For serial_is_open.  */
+  scb->bufp = NULL;
+
+  serial_unref (scb);
 }
 
 void
@@ -307,6 +312,26 @@ serial_un_fdopen (struct serial *scb)
 }
 
 int
+serial_is_open (struct serial *scb)
+{
+  return scb->bufp != NULL;
+}
+
+void
+serial_ref (struct serial *scb)
+{
+  scb->refcnt++;
+}
+
+void
+serial_unref (struct serial *scb)
+{
+  --scb->refcnt;
+  if (scb->refcnt == 0)
+    xfree (scb);
+}
+
+int
 serial_readchar (struct serial *scb, int timeout)
 {
   int ch;
diff --git a/gdb/serial.h b/gdb/serial.h
index 8ba8ae6..187ed03 100644
--- a/gdb/serial.h
+++ b/gdb/serial.h
@@ -37,13 +37,18 @@ typedef void *serial_ttystate;
 struct serial;
 
 /* Try to open NAME.  Returns a new `struct serial *' on success, NULL
-   on failure.  Note that some open calls can block and, if possible, 
-   should be  written to be non-blocking, with calls to ui_look_hook 
-   so they can be cancelled.  An async interface for open could be
-   added to GDB if necessary.  */
+   on failure.  The new serial object has a reference count of 1.
+   Note that some open calls can block and, if possible, should be
+   written to be non-blocking, with calls to ui_look_hook so they can
+   be cancelled.  An async interface for open could be added to GDB if
+   necessary.  */
 
 extern struct serial *serial_open (const char *name);
 
+/* Returns true if SCB is open.  */
+
+extern int serial_is_open (struct serial *scb);
+
 /* Find an already opened serial stream using a file handle.  */
 
 extern struct serial *serial_for_fd (int fd);
@@ -52,10 +57,18 @@ extern struct serial *serial_for_fd (int fd);
 
 extern struct serial *serial_fdopen (const int fd);
 
-/* Push out all buffers, close the device and destroy SCB.  */
+/* Push out all buffers, close the device and unref SCB.  */
 
 extern void serial_close (struct serial *scb);
 
+/* Increment reference count of SCB.  */
+
+extern void serial_ref (struct serial *scb);
+
+/* Decrement reference count of SCB.  */
+
+extern void serial_unref (struct serial *scb);
+
 /* Create a pipe, and put the read end in files[0], and the write end
    in filde[1].  Returns 0 for success, negative value for error (in
    which case errno contains the error).  */
@@ -213,6 +226,10 @@ extern int serial_debug_p (struct serial *scb);
 
 struct serial
   {
+    /* serial objects are ref counted (but not the underlying
+       connection, just the object's lifetime in memory).  */
+    int refcnt;
+
     int fd;			/* File descriptor */
     /* File descriptor for a separate error stream that should be
        immediately forwarded to gdb_stderr.  This may be -1.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]