This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Making the transport layer more robust


Hi,

On Fri, Aug 12, 2011 at 07:43:24PM +0200, Mark Wielaard wrote:
> commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> Author: Mark Wielaard <mjw@redhat.com>
> Date:   Fri Aug 12 19:34:20 2011 +0200
> 
>     Remove _stp_ctl_work_timer from module transport layer.
>     
>     The _stp_ctl_work_timer would trigger every 20ms to check whether
>     there were cmd messages queued, but not announced yet and to
>     check the _stp_exit_flag was set.
>     
>     This commit makes all control messages announce themselves and
>     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
>     is still possibly delayed since the messages are just pushed on
>     a wait queue).

And with the timer out of the way it wasn't too hard to add poll
support to the command channel so that we can use a sleeping select
on the channel instead of busy-polling in stapio.

commit a9e19b380f9814630018e79b8cafa3c675dd182c
Author: Mark Wielaard <mjw@redhat.com>
Date:   Sun Aug 14 23:07:46 2011 +0200

    Implement and use select to wait for cmd channel data.
    
    Add a poll implementation to runtime/transport/control.c
    (_stp_ctl_poll_cmd) based on the _stp_ctl_ready_q wait queue.
    Check whether select is supported in runtime/staprun/mainloop.c
    (stp_main_loop) and use pselect with a sigmask that includes
    SIGURG to get EINTR notifications whenever an interruptable
    event occurred.

I am not seeing any regressions with this, but the signal code
in runtime/staprun/mainloop.c is pretty, uhm, creative, so some
extra review and testing would certainly be appreciated.

This has a nice effect on the stapio impact during probing.
With stap 1.6:
$ stap -e 'global scs;
  probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10' 
scs["read"]=0x5b
scs["fcntl"]=0x52
scs["ppoll"]=0x32
scs["nanosleep"]=0x28
scs["execve"]=0x5
scs["kill"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1

With stap from git trunk:
$ stap -e 'global scs;
  probe syscall.* { if (execname() == "stapio") scs[name]++ }' -c 'sleep 10'
scs["read"]=0x34
scs["ppoll"]=0x32
scs["execve"]=0x5
scs["fcntl"]=0x4
scs["kill"]=0x1
scs["pselect6"]=0x1
scs["sigreturn"]=0x1
scs["rt_sigaction"]=0x1
scs["rt_sigprocmask"]=0x1
scs["wait4"]=0x1
scs["write"]=0x1

So in this example one pselect6 replaces ~38 reads, ~80 fcntls and
~40 nanosleeps. The remaining reads and (timeing out) ppolls come
from the relay channel. I haven't investigated yet whether those
can be eliminated too.

Cheers,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]