This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC/PATCH] TCP and Unix socket accept queue time examples


William Cohen <wcohen@redhat.com> wrote:
> On 12/07/2012 08:13 PM, Eric Wong wrote:
> > +++ b/testsuite/systemtap.examples/network/tcp_acceptq_time.stp
> > @@ -0,0 +1,17 @@
> > +#! /usr/bin/env stap
> > +
> > +/* wrap-around just in case */
> 
> what does the comment above mean?
> 
> > +global tcp_acceptq_start%
> 
> What is the '%' doing at the end of tcp_acceptq_start?

That makes the array wrap-around when it hits MAPMAXENTRIES
(in case I have bugs or the kernel changes).  Perhaps it ends up
hiding bugs and its a bad idea...

> > +
> > +probe kernel.function("sk_acceptq_added") {
> > +	tcp_acceptq_start[$sk] = cpu_clock_us(0)
> > +}
> 
> Why use cpu_clock_us(0) from processor 0? On multiprocessor systems this can cause an interprocessor interrupt.
> Why not use gettimeofday_us() instead?

I wanted a monotonic clock in case system time changed during
measurement.  I stuck to using processor 0 because the documentation
stated there could be drift between CPUs.

gettimeofday_* is probably fine as most production systems have fairly
stable time and some users are stuck on older, distro-provided versions
of SystemTap without the newish cpu_clock_* functions.

> > +probe kernel.function("sk_acceptq_removed") {
> > +	started = tcp_acceptq_start[$sk]
> > +	if (started) {
> > +		delete tcp_acceptq_start[$sk]
> > +		diff = cpu_clock_us(0) - started
> > +		printf("%s[%d] %ld\n", execname(), pid(), diff)
> > +	}
> > +}
> 
> The sk_acceptq_added increments backlog and sk_accepts_removed decrements, so there could be more than thing listening.  So this is only going to count the time from the last sk_acceptq_added to the first sk_acceptq_removed.  Also it sk_acceptq_remove isn't clearing out tcp_acceptq_start global array so eventually that will fill up. Which tiome is most useful:
> 
> -from earliest sk_acceptq_added to earliest sk_acceptq_removed
> -from latest sk_acceptq_added to earliest sk_acceptq_removed
> -from earliest sk_acceptq_added to latest sk_acceptq_removed
> -from latest sk_acceptq_added to latest sk_acceptq_removed

Good catch!  I based that code off hints I got from fche earlier:
y0m38zrro7u.fsf@fche.csb">http://mid.gmane.org/y0m38zrro7u.fsf@fche.csb

Since I want to measure the time of every single accepted socket,
I should probably look into reqsk_queue_* functions instead and
ignore the sk_acceptq_* functions.

> > diff --git a/testsuite/systemtap.examples/network/unix_acceptq_time.meta b/testsuite/systemtap.examples/network/unix_acceptq_time.meta
> > @@ -0,0 +1,35 @@
> > +#! /usr/bin/env stap
> > +
> > +/* wrap-around just in case there are more unix_create1() callers */
> > +global unix_acceptq_start%
> 
> What is the '%' at the end of the line for?

Same reasoning as the TCP one above.

> > +
> > +/*
> > + * Linux 3.6.x only has two unix_create1() callers:
> > + * - unix_stream_connect() - we care about this
> > + * - unix_create()         - we do not care about this
> > + */
> > +probe kernel.function("unix_create1").return {
> > +	unix_acceptq_start[$return] = cpu_clock_us(0)
> > +}
> 
> unix_create1() is a static function in the kernel that appears to only be called by other functions in the file.  This currently works, but this function could be inlined by the compiler and then the function().return probe point won't be available.

Ah, OK.  See below...

> Probably should use gettimeofday_us() rather than the cpu_clock_us(0) which could cause an interprocessor interrupt on smp machines.
> > +
> > +/*
> > + * we only care about unix_create1() called from unix_stream_connect(),
> > + * so avoid wasting space here.
> > + * Maybe future (or past) kernels had more unix_create1() callers,
> > + * wraparound for the unix_acceptq_start array will prevent us from hitting limits
> > + */
> > +probe kernel.function("unix_create").return {
> > +	delete unix_acceptq_start[$return]
> > +}
> 
> Would it be possible to watch the unix_stream_connect() directly and avoid probing unix_create() and unix_create1()?

I tried probing unix_stream_connect() at first, but was unable to figure
out how to get the newsk variable, so I fell back to probing
unix_create1().

I haven't looked harder, yet, but if anybody has suggestions or hints I'd
like to hear them :)

I think my original (unpublished) version of this probed
__skb_queue_tail, even.

Thank you very much for the comments, I'll start working on an updated
patch.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]