This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fw: systemtap application to find applications doing polling


* William Cohen <wcohen@redhat.com> [2009-02-04 12:08:07]:

> Vaidyanathan Srinivasan wrote:
> > * Maneesh Soni <maneesh@in.ibm.com> [2009-01-29 22:16:43]:
> > 
> >> Is this something useful for energy management?
> > 
> > Hi Maneesh,
> > 
> > This would be useful for energy management as Ulrich has noted in his
> > blog.  The rate of wake up is reposted by PowerTop using
> > /proc/timer_list where even the device driver timers and in kernel
> > offenders are also identified.
> > 
> > Once the userspace application is identified, then further details on
> > the type of polling loops and syscall and library APIs will definitely
> > help optimise the user applications.
> > 
> > Will's script will help to identify types of polling loops and top
> > offenders at run time in an user space application.
> >  
> >> ----- Forwarded message from William Cohen <wcohen@redhat.com> -----
> >>
> >> Date: Wed, 28 Jan 2009 11:52:10 -0500
> >> From: William Cohen <wcohen@redhat.com>
> >> To: SystemTAP <systemtap@sources.redhat.com>
> >> CC: Ulrich Drepper <drepper@redhat.com>
> >> Subject: systemtap application to find applications doing polling
> >>
> >> Hi All,
> >>
> >> Uli Drepper mentions in a blog entry need "avoid unnecessary wakeups" and that a
> >> systemtap script to monitor this would be useful:
> >>
> >> http://udrepper.livejournal.com/19041.html
> >>
> >> I talked with Uli about developing the script that identify the processes that
> >> are doing a lot of polling.  The attached script, timeout.stp, monitors the
> >> poll, epoll_wait,  select, futex, nanosleep, timer (it_real_fn). The poll and
> >> epoll are only recorded if the timeout value is greater than zero. The resulting
> >> output is displayed in a top-like format for the top twenty processes with the
> >> entries ordered from most problem calls to fewest. The columns indicate the
> >> count of each type. The output ends up like the following:
> >>
> >>   uid |   poll  select   epoll  itimer   futex nanosle  signal| process
> >>  2628 |      0     364       0       0       0       0       0| Xorg
> >>  3586 |     21       0       0       0     179       0       0| thunderbird-bin
> >>  3575 |     41       0       0       0       0      20       0| xchat
> >>  3454 |      0      60       0       0       0       0       0| emacs
> >>  3325 |     43       0       0       0       0       0       0| gnome-terminal
> >>  3082 |     11       0       0       0       0       0       0| gnome-panel
> >>  3068 |      7       0       0       0       0       0       0| metacity
> >>  3181 |      6       0       0       0       0       0       0| wnck-applet
> >>  3119 |      0       5       0       0       0       0       0| httpd
> >>  2135 |      4       0       0       0       0       0       0| hald
> >>  2307 |      4       0       0       0       0       0       0| NetworkManager
> >>  2362 |      4       0       0       0       0       0       0| setroubleshootd
> >>  2530 |      0       0       0       0       0       4       0| cups-polld
> >>  3084 |      3       0       0       0       0       0       0| nautilus
> >>  3616 |      0       0       0       0       3       0       0| firefox
> >>  3060 |      2       0       0       0       0       0       0| gnome-settings-
> >>  2304 |      2       0       0       0       0       0       0| hald-addon-stor
> >>     0 |      0       0       0       1       0       0       0| swapper
> >>
> >> I plan to check this into systemtap.examples directory in next day or so. Just
> >> looking to see if people have additional suggestions.
> >>
> >> -Will
> > 
> > This output information and format is good, while I have the following
> > comments and suggestion:
> > 
> > * Display the observation interval in the output and provide options
> >   for say 1s or 10s sampling
> 
> It is possible to have an optional argument in systemtap such as the
> para-callgraph.stp:
> 
> http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=blob_plain;f=testsuite/systemtap.examples/general/para-callgraph.stp;hb=HEAD
> 
> > * At low wakeup rate does the system tap script itself add to the
> >   wakeups?
> 
> No effort is made to filter out the impact from the systemtap code from the
> output. Don't see the effect in the output of timeout.stp, but in powertop can
> see some effect:
> 
>   41.8% (1000.0)           staprun : __mod_timer (__stp_time_timer_callback)
>   41.8% (1000.0)             udevd : __mod_timer (__stp_time_timer_callback)
>    4.2% (100.0)           staprun : __mod_timer (__utt_wakeup_timer)
>    4.2% ( 99.8)           staprun : queue_delayed_work (delayed_work_timer_fn)
> 
> This makes me wonder if there is someway to reduce staprun's effect.

This wakeup rate is very high and this implies that we should use the
stap script for a per-application level wakeup tracing only and should
not try to profile the overall system.

Definitely some opportunity here for stap to reduce wakeups :)
But what is causing udevd to wakeup so often!
 
> > * Does these values match closely with PowerTop?
> 
> Powertop shows rate and the current timeout script is showing total
> accumulation. If the timeout script is adjusted to print every 10 seconds and
> clear out the data then a more direct comparison can be made. I made that change
> and looked at the output. There appears to be some differences in what each is
> measuring. Powertop reading /proc/timer_stats need to check to see how that
> differs from what timeout.stp is probing.

Overall wakeup rate shown by powertop is averaged over nr_cpus.  The
per application/thread wakeup count is accurate as far as I have
determined from experiments and I have also compared against
/proc/interrupts.  (LOC is the local timers)

> > * Can we aggregate these values for a group of PIDs (possibly
> >   parent pid or tgid) so that we can collect results for a complete
> >   application stack easily.  I have tried doing this by manually
> >   adding up wake-ups for a group of PIDs
> 
> There have been examples that have PID filters that limit the scope to some
> subset of PIDs and their children. Put the PID and any children in to
> associative array and then check the associative array before doing probing
> operation.

Yeah, this should be easy with stap.
 
> > * Another wishlist item would be to be able to add a probe at various
> >   locations in library and move closer to userspace code. 
> 
> There has been some work on userspace probing for systemtap. It isn't in a
> packaged distro yet, but there should be one for fedora coming out soon.
> However, this needs utrace in the kernel.

Looking forward to this feature.  This will bring statistics and
tracing closer to libraries where there may be better scope for
optimisations.

Thanks,
Vaidy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]