This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Possible regression on PPC64 testsuite with native-{extended-}gdbserver


On 05/05/2016 02:11 AM, Sergio Durigan Junior wrote:

> As I said, this problem doesn't happen when we're not testing gdbserver
> configurations.
> 
> I haven't investigated the problem further, and it may very well be
> something unrelated to this patch (notice that, although the failure
> happens several times, it's not deterministic), but I decided it was
> a good thing to raise awareness.

So far, this looks unrelated to the patch in question to me.

Testing on gcc112 on the compile farm, (POWER8/PPC64, Fedora 21), I
do see the testsuite hanging too.  The whole testsuite runs, but then
a couple tests hang forever, it seems.  ps shows:

palves    72739  72012  0 10:31 pts/49   00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/process-dies-while-handling-bp gdb.threads/process-dies-while-handling-bp.exp --target_board=native-gdbserver
palves   157333 156965  0 10:30 pts/49   00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.base/cond-expr gdb.base/cond-expr.exp --target_board=native-gdbserver

So indeed one of them is  gdb.threads/process-dies-while-handling-bp.exp.

Attaching to the gdbserver process, we see it stuck here:

(gdb) bt
#0  0x00003fff96b5ddf4 in sigsuspend () from /lib64/libc.so.6
#1  0x0000000010049fa0 in linux_wait_for_event_filtered (wait_ptid=..., filter_ptid=..., wstatp=0x3fffc5d76608, options=1073741824)
    at ../../../src/gdb/gdbserver/linux-low.c:2709
#2  0x000000001004d208 in wait_for_sigstop () at ../../../src/gdb/gdbserver/linux-low.c:3904
#3  0x000000001004d97c in stop_all_lwps (suspend=0, except=0x0) at ../../../src/gdb/gdbserver/linux-low.c:4041
#4  0x00000000100466d8 in linux_kill (pid=82121) at ../../../src/gdb/gdbserver/linux-low.c:1345
#5  0x0000000010021024 in kill_inferior (pid=82121) at ../../../src/gdb/gdbserver/target.c:326
#6  0x000000001001baf4 in detach_or_kill_inferior_callback (entry=0x100335f45c0) at ../../../src/gdb/gdbserver/server.c:3388
#7  0x0000000010008f24 in for_each_inferior (list=0x100b79e8 <all_processes>, action=0x1001ba54 <detach_or_kill_inferior_callback(inferior_list_entry*)>)
    at ../../../src/gdb/gdbserver/inferiors.c:55
#8  0x000000001001bdcc in detach_or_kill_for_exit () at ../../../src/gdb/gdbserver/server.c:3449
#9  0x000000001001be2c in detach_or_kill_for_exit_cleanup (ignore=0x0) at ../../../src/gdb/gdbserver/server.c:3463
#10 0x0000000010040814 in do_my_cleanups (pmy_chain=0x100b0490 <cleanup_chain>, old_chain=0x10075310 <sentinel_cleanup>)
    at ../../../src/gdb/gdbserver/../common/cleanups.c:154
#11 0x0000000010040918 in do_cleanups (old_chain=0x10075310 <sentinel_cleanup>) at ../../../src/gdb/gdbserver/../common/cleanups.c:176
#12 0x000000001004144c in throw_exception_cxx (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:289
#13 0x0000000010041584 in throw_exception (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:317
#14 0x0000000010041778 in throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?")
    at ../../../src/gdb/gdbserver/../common/common-exceptions.c:373
#15 0x0000000010041810 in throw_vquit (fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:385
#16 0x00000000100418dc in throw_quit (fmt=0x1006d820 "Quit") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:404
#17 0x000000001001cf04 in captured_main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3790
#18 0x000000001001cf94 in main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3804
(gdb) 

So gdbserver was quitting, and was trying to kill all child
processes along with it, and then it hangs.  Process 82121, the
one gdb server was trying to kill is actually dead already:

 [palves@gcc2-power8 src]$ cat /proc/82121/status  | grep State
 State:  Z (zombie)

That we mishandle the case of the process dying unexpectedly is
already known and it manifests in several different ways, so
that's not much surprising.  That's exactly the point of that
test in the first place.  

What _is_ surprising is that the testsuite framework doesn't
time out eventually...

I attached to the corresponding gdb process, and I see
that we're stuck in a loop sending "monitor exit" to gdbserver,
in rcmd:

10292         if (getpkt_sane (&rs->buf, &rs->buf_size, 0) == -1)
10293           { 
10294             /* Timeout.  Continue to (try to) read responses.
10295                This is better than stopping with an error, assuming the stub
10296                is still executing the (long) monitor command.
10297                If needed, the user can interrupt gdb using C-c, obtaining
10298                an effect similar to stop on timeout.  */
10299             continue;
10300           }

I mean, getpkt_sane is constantly timing out.  We can step through the code
and see that:

(gdb) finish
Run till exit from #0  do_ser_base_readchar (scb=0x1002eb57cc0, timeout=1) at ../../src/gdb/ser-base.c:345
0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>) at ../../src/gdb/ser-base.c:424
424           ch = do_readchar (scb, timeout);
Value returned is $1 = -2
(gdb) finish
Run till exit from #0  0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>)
    at ../../src/gdb/ser-base.c:424
0x0000000010068a2c in ser_base_readchar (scb=0x1002eb57cc0, timeout=2) at ../../src/gdb/ser-base.c:451
451       return generic_readchar (scb, timeout, do_ser_base_readchar);
Value returned is $2 = -2
(gdb) finish
...


While this code is debatable too, I think that expect/runtest/dejagnu
itself should be timing out, and then force-killing gdb anyway.

The still-running log shows:

$ tail testsuite/outputs/gdb.threads/process-dies-while-handling-bp/gdb.log
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: set breakpoint that evals false
continue &
Continuing.
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: continue &
KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
Remote debugging from host 127.0.0.1
gdbserver: reading register 0: No such process
Killing process(es): 82121
monitor exit
Ignoring packet error, continuing...
$ 

So all is self consistent.  

I just don't understand why doesn't dejagnu timeout.

I think the "monitor exit" is the one from
gdb/testsuite/lib/gdbserver-support.exp:gdb_exit.



The other hung test is "gdb.base/cond-expr.exp".  This one's more
mysterious.  Attaching to the gdb in question, I really see nothing.  gdb
is not debugging anything, and is just waiting for input.

However, from:

$ tail testsuite/outputs/gdb.base/cond-expr/gdb.log 
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
/home/palves/gdb/build/gdb/testsuite/../../gdb/gdb version  7.11.50.20160505-git -nw -nx -data-directory /home/palves/gdb/build/gdb/testsuite/../data-directory  -ex "set auto-connect-native-target off"

runtest completed at Thu May  5 10:31:20 2016
$ 

we see that that test did complete.  So I can't really explain that...

So in sum, we may have gdb or gdbserver bugs, but the framework
should be timing out and coping anyway.  Why isn't it?  I have no
clue atm.

Thanks,
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]