This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64) (was: Re: [PATCH] Don't truncate the history file when history size is unlimited)


On Fri, Jul 24, 2015 at 10:03 AM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Thu, Jul 23, 2015 at 3:33 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
>> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior
>> <sergiodj@redhat.com> wrote:
>>> On Tuesday, June 16 2015, Patrick Palka wrote:
>>>
>>>> We still do not handle "set history size unlimited" correctly.  In
>>>> particular, after writing to the history file, we truncate the history
>>>> even if it is unlimited.
>>>>
>>>> This patch makes sure that we do not call history_truncate_file() if the
>>>> history is not stifled (i.e. if it's unlimited).  This bug causes the
>>>> history file to be truncated to zero on exit when one has "set history
>>>> size unlimited" in their gdbinit file.  Although this code exists in GDB
>>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in
>>>> GDB 7.9 (PR gdb/17820).
>>>
>>> Hey Patrick,
>>>
>>> Looking at the BuildBot logs today, I found that this new test is
>>> failing occasionally on native-extended-gdbserver testing.  Take a look
>>> at the following build:
>>>
>>>   <http://gdb-build.sergiodj.net/builders/Debian-x86_64-native-extended-gdbserver-m64/builds/1429>
>>>
>>> You can see that gdb.base/gdbinit-history.exp failed:
>>>
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands
>>>   PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands
>>>
>>> The gdb.log is here:
>>>
>>>   <http://gdb-build.sergiodj.net/cgit/Debian-x86_64-native-extended-gdbserver-m64/.git/plain/gdb.log?id=2abe37b834f73838c68e1f843bdd612cef4a2ae3>
>>>
>>> I haven't really investigated to determine what's going on here, but let
>>> me know if you need any help with this.
>>
>> Thanks for the heads up.
>>
>> When doing gdb_exit followed by gdb_start, in the output log sometimes
>> we have (this is printed shortly before the first FAIL)
>>
>> (gdb) ...
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> spawn ...
>> GNU gdb (GDB) 7.10.50.20150723-cvs
>> ...
>>
>> Other times we have (this is printed shortly before the second FAIL)
>>
>> (gdb) ...
>> Remote debugging from host 127.0.0.1
>> monitor exit
>> (gdb) spawn ...
>> GNU gdb (GDB) 7.10.50.20150723-cvs
>> ...
>>
>> The literal difference being the "(gdb) " prompt printed before the
>> "spawn" message.  In the first case (where the "(gdb) " prefix is not
>> there) the history file does not seem to be written/appended to.  In
>> the second case (when the "(gdb) " prefix is there) the history file
>> is properly written/appended to (but it still FAILs because we're
>> missing the command history from before the first case).  So the race,
>> if there is one, may have something to do with whether or not the
>> "(gdb) " prompt gets printed after doing "monitor exit".  Or maybe
>> not.  I'll do more analysis later.
>
> After further analysis I don't think there is any correlation between
> whether "(gdb) spawn ..." or  else "spawn ..." gets printed, and if a
> race between "gdb_exit" and "gdb_start" occurs.  In fact, I don't
> think there even is a race between gdb_exit and gdb_start.  If I add
> "after 100" (i.e. TCL's way of sleeping 100ms) in the middle of such
> sequences (i.e. gdb_exit; after 100; gdb_start) in
> gdbinit-history.exp, I can still reproduce the intermittent FAILs.
>
> So it may be the case that sometimes gdb_exit does not kill the GDB
> process properly, and by doing so the process doesn't get a chance to
> save to the history file.  And it only happens with
> extended-gdbserver, not with gdbserver or non-gdbserver.  A unique
> code path taken only by the extended-gdbserver target is the code in
> gdbserver-support.exp:gdb_exit guarded by "if {[info exists
> gdb_spawn_id] && [info exists server_spawn_id]}", and then the
> close_gdbserver proc that follows.  However if I just outright delete
> that code (which should make the exit logic nearly identical with the
> gdbserver target) I can still trigger the intermittent FAILs...  Maybe
> it's an issue in dejagnu?  Or could it be an obscure bug in GDB??

One thing I noticed while repeatedly running the test with
--target_board=native-extended-gdbserver --verbose --verbose is that
it sometimes takes a few seconds to kill the GDB process.  The output
log pauses at

Quitting /scratchpad/binutils-gdb-build/gdb/testsuite/../../gdb/gdb
-nw -nx -data-directory
/scratchpad/binutils-gdb-build/gdb/testsuite/../data-directory  -ex
"set auto-connect-native-target off"
Closing the remote shell exp8
doing kill, pid is 16925
pid is 16925
<PAUSE>

for 5 or more seconds.  I only see this happen with
native-extended-gdbserver.  I suppose it's waiting for the command
(which gets spawned by dejagnu in its standard_close proc)

    sh -c exec > /dev/null 2>&1 && (kill -2 -16925 || kill -2 16925)
&& sleep 5 && (kill -16925 || kill 16925) && sleep 5 && (kill -9
-16925 || kill -9 16925) &

to finish sleeping.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]