This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Systemtap userspace performance


Hi,

I'm evaluating the performance of systemtap userspace instrumentation
(with utrace) and I'd like to make sure the procedure I use is the right
one to get the best results.

I'm particularly interested in the scalability of the solution to
multi-core and multi-processor systems.

My test program creates threads which have for only purpose to call a
function which calls 10 million times an other function designed only to
be instrumented (does nothing but is not inlined). The important parts
of the program are at the bottom of this message.

Right now I'm only working with 1 thread, my baseline (running the
program without instrumentation) gives me 0:00.27 of elapsed time (0.27
second).
I'm on Fedora 14 with a 2.6.35.10-74.fc14.x86_64 kernel with
systemtap-1.3-3.fc14.x86_64 on a Core 2 Duo P8600 with 4G of RAM.

I first instrumented the application using the following script :

probe process("./benchmark").function("single_trace")
{
	printf("%d : %d\n", gettimeofday_ns(), $v);
}

It works and gives me the result I expect, but running it in flight
recorder mode without outputting anything (just stap test.stp -F) gave
me : 0:51.82 elapsed time.

I then came across the embedded markers way to instrument an application
(with the dtrace probe points).
So I added the required code (following the documentation) and updated
my probe to :

probe process("./benchmark").mark("single_trace")
{
	printf("%d : %d\n", gettimeofday_ns(), $arg1);
}

Now I get 0:27.60 elapsed time which is better but is still a huge
overhead (especially when considering I'm not writing to disk anything
is recorded in the buffer).

I mesured the gettimeofday_ns() is responsible for 8.4 seconds for each
test.

So is there a way to get better results or is it the optimal way to
instrument an application with userspace systemtap ?

Thanks,

Julien



#define NR_EVENTS 10000000

void single_trace(unsigned int v)
{
    TRACE(TRACEPOINT_BENCHMARK_SINGLE_TRACE(v));
}

void do_trace(void)
{
    long i;

    for (i = 0; i < NR_EVENTS; i++)
        single_trace(42);
}

void *thr1(void *arg)
{
    do_trace();
    return ((void*)1);
}

int main(int argc, char **argv)
{
[...snip the init stuff...]
    for (i = 0; i < nr_threads; i++) {
        err = pthread_create(&tid[i], NULL, thr1, NULL);
        if (err != 0)
            exit(1);
    }

    for (i = 0; i < nr_threads; i++) {
        err = pthread_join(tid[i], &tret);
        if (err != 0)
            exit(1);
    }
    return 0;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]