This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA] Tru64 stack unwinding fix


This problem happens on both Tru64 4.0f and 5.1/5.1A.

To reproduce the problem, you can use any Ada program using tasks,
and try to perform a task switch:
<<
(gdb) info tasks
  ID       TID P-ID Pri Stack  % State                  Name
   1 140025400    0  30  Unknown Child Termination Wait main_task
*  2 140049000    1  30   48K 10 Accepting RV with 3    maitre_d
   3 140049c00    1  30   48K  7 Waiting on entry call  dijkstra
   4 140050000    1  30   48K  7 Waiting on entry call  stroustrup
   5 140054800    1  30   48K  7 Waiting on entry call  anderson
   6 140055400    1  30   48K  7 Runnable               ichbiah
   7 140061000    1  30   48K  7 Waiting on entry call  taft
(gdb) task 4
warning: Hit heuristic-fence-post without finding         <<<---  OOPS!
warning: enclosing function for address 0x1400433e0
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
[Switching to task 4]
#0  0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
>>

The real problem is not the task switching operation per se, but
rather the fact that GDB fails to unwind the stack correctly. This
problem also appears if we try to use "bt" after the task switch:
<<
(gdb) bt
#0  0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1  0x3ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so
#2  0x3ff80560c30 in __dspDispatch () from /usr/shlib/libpthread.so
warning: Hit heuristic-fence-post without finding     <<<---   Same oops!
warning: enclosing function for address 0x1400433e0
>>

I'm sorry the test case involves Ada, but that is the only way
I found to reproduce it at the time when I looked at this problem.

The problem happens on all non-active tasks, but otherwise GDB works
fine with the active one. So why does it always fail on the non-active
tasks, and no in the active one? The answer lies in the fact that all
non-active threads call the same procedure to perform a context switch,
and end up being stopped at the same code location, somewhere inside
__hstTransferRegisters. This means that the the stack trace for all
non-current tasks always starts like this:

 #0  0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
 #1  0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
 #2  0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
 #3  0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so

GDB is able to unwind up to __dspTransferContext, and then always fails.
I found that on alpha, GDB uses a heuristic algorithm to compute the
stack (that is actually not surprising given the warning message): For
each frame, it actually computes the "proc_desc" by reading the
instructions in the function prologue.

I later found that GDB was computing an incorrect return address for 
frame #2. Using this incorrect return address, GDB naturally could not
locate the function which called __dspTransferContext, and therefore
reported an error. The answer to this failure lied in the
computation of the proc_desc of frame #1, but we need to look at its
code to understand why:

0x3ff805af250 <__osTransferContext>:    ldah    gp,16321(t12)
0x3ff805af254 <__osTransferContext+4>:  unop
0x3ff805af258 <__osTransferContext+8>:  lda     gp,-3312(gp)
0x3ff805af25c <__osTransferContext+12>: unop
0x3ff805af260 <__osTransferContext+16>: lda     sp,-64(sp)
0x3ff805af264 <__osTransferContext+20>: stq     ra,0(sp)
0x3ff805af268 <__osTransferContext+24>: stq     s0,8(sp)
0x3ff805af26c <__osTransferContext+28>: stq     s1,16(sp)
0x3ff805af270 <__osTransferContext+32>: stq     s2,24(sp)
0x3ff805af274 <__osTransferContext+36>: stq     s3,32(sp)
0x3ff805af278 <__osTransferContext+40>: stq     s4,40(sp)
0x3ff805af27c <__osTransferContext+44>: unop
0x3ff805af280 <__osTransferContext+48>: stq     fp,48(sp)
0x3ff805af284 <__osTransferContext+52>: mov     sp,fp
[...]
0x3ff805af304 <__osTransferContext+180>:        lda     sp,-336(sp)
[...]

As we can see with the lda instruction at +16, the frame size is 64
bytes.  And instruction at +20 shows that the return address is saved
at 0(sp). At first, GDB gets the correct frame size, and finds that
ra is at 0(sp) and therefore computes the adress where ra is saved by
adding 0 to the current value of sp. This is where things start to go
wrong.

I have also pasted one very important instruction at +180, which expands
the size of the current frame (typical of a frame which does some
alloca)! So 0(sp) does not point to the saved $ra anymore, Ouch!
Fortunately, the instruction at +52 shows that $sp has been saved into
$fp, so $ra (and all other saved registers) can be retrieved using $fp
as the base address instead of $sp. This is the main point of my change:
when the $fp register is used in the frame, then use it, rather than
using $sp.

Another error in the current code (somewhat related really) is that the
heuristic algorithm was accumulating all stack allocations detected in
the function being inspected. In our case, it found two "lda sp,
nnn(sp)" instructions, one at +16, and one at +180, and therefore
considered the size of the frame to be 64+336=400 bytes. Oups. I also
fixed the code to only take into account the first stack allocation to
get the frame size.

With both fixes, the backtrace started working in this case as well:
#0  0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1  0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
#2  0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
#3  0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so
#4  0x3ff805a01f4 in __cvWaitPrim () from /usr/shlib/libpthread.so
#5  0x3ff8059da1c in __pthread_cond_wait () from /usr/shlib/libpthread.so
#6  0x12003dedc in system.tasking.entry_calls.wait_for_completion ()
    at s-taenca.adb:6
#7  0x120047b10 in system.tasking.rendezvous.call_synchronous ()
    at s-tasren.adb:6
#8  0x120043acc in system.tasking.rendezvous.call_simple () at s-tasren.adb:6
#9  0x12002f824 in phil.philosopher (<_task>=0x140042ee0) at phil.adb:49
#10 0x120049b40 in system.tasking.stages.task_wrapper () at s-tassta.adb:6
#11 0x3ff805bca7c in __thdBase () from /usr/shlib/libpthread.so

Joy!

Here is the ChangeLog:
2002-06-14  Joel Brobecker  <brobecker@gnat.com>
        * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
        current frame using only the first stack size adjustment. All
        subsequent size adjustments are not considered to be part of
        the "static" part of the current frame.
        Compute the address of the saved registers relative to the
        Frame Pointer ($fp) instead of the Stack Pointer if $fp is
        in use in this frame.

Ok to commit?

Thanks,
-- 
Joel

Attachment: alpha-tdep.c.diff
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]