This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH/7.10 2/2] gdbserver: Fix non-stop / fork / step-over issues


On 7/31/2015 12:02 PM, Pedro Alves wrote:
> On 07/31/2015 07:04 PM, Don Breazeal wrote:
>> On 7/31/2015 10:03 AM, Pedro Alves wrote:
>>> Ref: https://sourceware.org/ml/gdb-patches/2015-07/msg00868.html
>>>
>>> This adds a test that has a multithreaded program have several threads
>>> continuously fork, while another thread continuously steps over a
>>> breakpoint.
>>
>> Wow.
>>
> 
> If gdb survives these stress tests, it can hold up to anything.  :-)
> 
>>>  - The test runs with both "set detach-on-fork" on and off.  When off,
>>>    it exercises the case of GDB detaching the fork child explicitly.
>>>    When on, it exercises the case of gdb resuming the child
>>>    explicitly.  In the "off" case, gdb seems to exponentially become
>>>    slower as new inferiors are created.  This is _very_ noticeable as
>>>    with only 100 inferiors gdb is crawling already, which makes the
>>>    test take quite a bit to run.  For that reason, I've disabled the
>>>    "off" variant for now.
>>
>> Bummer.  I was going to ask whether this use-case justifies disabling
>> the feature completely, 
> 
> Note that this being a stress test, may not be representative of a
> real work load.  I'm assuming most real use cases won't be
> so demanding.
> 
>> but since the whole follow-fork mechanism is of
>> limited usefulness without exec events, the question is likely moot
>> anyway.
> 
> Yeah.  There are use cases with fork alone, but combined with exec is
> much more useful.  I'll take a look at your exec patches soon; I'm very
> much looking forward to have that in.
> 
>>
>> Do you have any thoughts about whether this slowdown is caused by the
>> fork event machinery or by some more general gdbserver multiple
>> inferior problem?
> 
> Not sure.
> 
> The number of forks live at a given time in the test is constant
> -- each thread forks and waits for the child to exit until it forks
> again.   But if you run the test, you see that the first
> few inferiors are created quickly, and then as the inferior number
> grows, new inferiors are added at a slower and slower.
> I'd suspect the problem to be on the gdb side.  But the test
> fails on native, so it's not easy to get gdbserver out of
> the picture for a quick check.
> 
> It feels like some data structures are leaking, but
> still reacheable, and then a bunch of linear walks end up costing
> more and more.  I once added the prune_inferiors call at the end
> of normal_stop to handle a slowdown like this.  It feels like
> something similar to that.
> 
> With detach "on" alone, it takes under 2 seconds against gdbserver
> for me.
> 
> If I remove the breakpoint from the test, and reenable both detach on/off,
> it ends in around 10-20 seconds.  That's still a lot slower
> than "detach on" along, but gdb has to insert/remove breakpoints in the
> child and load its symbols (well, it could avoid that, given the
> child is a clone of the parent, but we're not there yet), so
> not entirely unexpected.
> 
> But pristine, with both detach on/off, it takes almost 2 minutes
> here.  ( and each thread only spawns 10 forks, my first attempt
> was shooting for 100 :-) )
> 
> I also suspected all the thread stop/restarting gdbserver does
> both to step over breakpoints, and to insert/remove breakpoints.
> But then again with detach on, there are 12 threads, with detach
> off, at most 22.  So that'd be odd.  Unless the data structure
> leaks are on gdbserver's side.  But then I'd think that tests
> like attach-many-short-lived-threads.exp or non-stop-fair-events.exp
> would have already exposed something like that.
> 
>>
>> Are you planning to look at the slowdown?  
> 
> Nope, at least not in the immediate future.
> 
>> Can I help out?  I have an
>> interest in having detach-on-fork 'off' enabled.  :-S
> 
> That'd be much appreciated.  :-)  At least identifying the
> culprit would be very nice.  I too would love for our
> multi-process support to be rock solid.
> 

Hi Pedro,
I spent some time looking at this, and I found at least one of the
culprits affecting performance.  Without going through the details of
how I arrived at this conclusion, if I insert

    gdb_test_no_output "set sysroot /"

just before the call to runto_main, it cuts the wall clock time by at
least half.  Running with just the 'detach-on-fork=off' case, it went
from 41 secs to 20 secs on one system, and 1:21 to 0:27 and 1:50 to 0:41
on another.  Successive runs without set sysroot resulted in
successively decreasing run times, presumably due to filesystem caching.

I ran strace -cw to collect wall clock time (strace 4.9 and above
support '-w' for wall time), and saw this:

Without set sysroot /:
% time     seconds  usecs/call     calls    errors syscall^M
------ ----------- ----------- --------- --------- ----------------^M
 25.90   14.620339           4   3666141       202 ptrace^M
 25.21   14.229421          81    175135        57 select^M
 14.42    8.139715          13    641874         7 write^M
 10.65    6.012699           4   1397576    670469 read^M
  7.52    4.245209           4   1205014       104 wait4^M
  4.90    2.765111           3    847985           rt_sigprocmask^M

With set sysroot /:
% time     seconds  usecs/call     calls    errors syscall^M
------ ----------- ----------- --------- --------- ----------------^M
 32.91    6.885008         148     46665        43 select^M
 21.59    4.516311           4   1158530       202 ptrace^M
 11.15    2.332491          13    184229         2 write^M
  9.07    1.897401           4    422122    203552 read^M
  6.77    1.415918          42     34076        53 open^M
  6.27    1.312490           3    378702       103 wait4^M
  4.00    0.835731           3    262195           rt_sigprocmask^M

The # calls and times for each case varied from run to run, but the
relative proportions stayed reasonably similar.  I'm not sure why the
unmodified case has so many more calls to ptrace, but it was not an
anomaly, I saw this in multiple runs.

Note that I used the original version of the test that you posted, not
the update on your branch.  Also, I didn't make the set sysroot command
conditional on running with a remote or gdbserver target, since it was
just an experiment.

Do you think there is more to the slowdown than this?  As you said
above, detach-on-fork 'off' is going to take longer than 'on'.  It may
be a little while before I can get back to this, so I thought I'd share
what I found. Let me know if you think this change will be sufficient.

thanks
--Don


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]