This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA] Wrong inner_frame sanity check with signal frame and -fstack-check


Hello,

Last year, a customer reported a problem on x86-linux where the debugger
was unable to unwind past a signal handler. We never managed to receive
a source-based reproducer, but they were able to provide a core file
that we could use to identify the source of the problem.  Basically,
this is what happens:

  - One of their Ada tasks is running out of stack space.
  - The program was compiled with -fstack-check, so a signal is raised
    and the signal handler kick-ins. The signal handler aboarts
    the program execution and dumps a core file.

When trying to get the backtrace for that task, they got:

        (gdb) bt
        #0  0xffffe410 in __kernel_vsyscall ()
        #1  0x0088237b in waitpid () from /lib/libc.so.6
        #2  0x00829bbf in do_system () from /lib/libc.so.6
        [...]
        #20 0x0a8c4a7f in system.interrupt_management.notify_exception ()
        #21 <signal handler called>
 !!!->  Backtrace stopped: previous frame inner to this frame

I filed the following notes about our frames:

> The error is produced by the debugger because it detected that the   
> "stack_addr" (in this case the CFA) of frame 21 is "inner" to the
> stack-addr of frame 20.  If you look at the progression of stack_addr
> for each frame starting at frame 18, you see:
> 
>         18: 0xf780d600
>         19: 0xf780d620
>         20: 0xf780d650
>         21: 0xf77ff000  (???)

Upon further investigation, I verified that the addresses were
correct. The reason why the sanity check tripped was because
AdaCore changed the stack-checking mechanism on this platform.

The new implementation uses an alternate stack for the signal handler
to run. Olivier Hainque explained the reason for switching to
this new model: ``we switched to this model not so long ago on x86-linux
to allow a more efficient stack-checking implementation with probes''.

As a result, the fact that the ``previous frame [is] inner to this frame''
is normal, and unwinding should continue.  I didn't see any other way
but to remove this sanity-check, and an associated addition to
the documentation.

2009-03-12  Joel Brobecker  <brobecker@adacore.com>

        * frame.c (get_prev_frame_1): Do not perform the inner_frame
        sanity check if this_frame is not NORMAL.
        (frame_id_inner): Update the description of this function.

This removes a sanity check, so we didn't really expect any change
in terms of behavior except in the case above, since this is the only
case we know of where we trip this sanity check.  Nonetheless, this
has been running in our tree since Mid-Dec 2008. And I tested this
again on amd64-linux.  No regression.

Any objection to me checking this patch in?

Thanks,
-- 
Joel
diff --git a/gdb/frame.c b/gdb/frame.c
index dfd6b3d..1d1856e 100644
--- a/gdb/frame.c
+++ b/gdb/frame.c
@@ -376,23 +376,29 @@ frame_id_eq (struct frame_id l, struct frame_id r)
    to sigaltstack).
 
    However, it can be used as safety net to discover invalid frame
-   IDs in certain circumstances.
+   IDs in certain circumstances. Assuming that NEXT is the immediate
+   inner frame to THIS and that NEXT and THIS are both NORMAL frames:
 
-   * If frame NEXT is the immediate inner frame to THIS, and NEXT
-     is a NORMAL frame, then the stack address of NEXT must be
-     inner-than-or-equal to the stack address of THIS.
+   * The stack address of NEXT must be inner-than-or-equal to the stack
+     address of THIS.
 
      Therefore, if frame_id_inner (THIS, NEXT) holds, some unwind
      error has occurred.
 
-   * If frame NEXT is the immediate inner frame to THIS, and NEXT
-     is a NORMAL frame, and NEXT and THIS have different stack
-     addresses, no other frame in the frame chain may have a stack
-     address in between.
+   * If NEXT and THIS have different stack addresses, no other frame
+     in the frame chain may have a stack address in between.
 
      Therefore, if frame_id_inner (TEST, THIS) holds, but
      frame_id_inner (TEST, NEXT) does not hold, TEST cannot refer
-     to a valid frame in the frame chain.   */
+     to a valid frame in the frame chain.
+
+   The sanity checks above cannot be performed when a SIGTRAMP frame
+   is involved, because signal handlers might be executed on a different
+   stack than the stack used by the routine that caused the signal
+   to be raised.  This can happen for instance when a thread exceeds
+   its maximum stack size. In this case, certain compilers implement
+   a stack overflow strategy that cause the handler to be run on a
+   different stack.  */
 
 static int
 frame_id_inner (struct gdbarch *gdbarch, struct frame_id l, struct frame_id r)
@@ -1274,9 +1280,10 @@ get_prev_frame_1 (struct frame_info *this_frame)
 
   /* Check that this frame's ID isn't inner to (younger, below, next)
      the next frame.  This happens when a frame unwind goes backwards.
-     This check is valid only if the next frame is NORMAL.  See the
-     comment at frame_id_inner for details.  */
-  if (this_frame->next->unwind->type == NORMAL_FRAME
+     This check is valid only if this frame and the next frame are NORMAL.
+     See the comment at frame_id_inner for details.  */
+  if (get_frame_type (this_frame) == NORMAL_FRAME
+      && this_frame->next->unwind->type == NORMAL_FRAME
       && frame_id_inner (get_frame_arch (this_frame->next), this_id,
 			 get_frame_id (this_frame->next)))
     {

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]