This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: whitelist for safe-mode probes (or just a better blacklist?)

From: Li Guanglei <guanglei at cn dot ibm dot com>
To: Li Guanglei <guanglei at cn dot ibm dot com>
Cc: Vara Prasad <prasadav at us dot ibm dot com>, Martin Hunt <hunt at redhat dot com>, "Frank Ch. Eigler" <fche at redhat dot com>, systemtap at sources dot redhat dot com
Date: Fri, 22 Sep 2006 23:55:32 +0800
Subject: Re: whitelist for safe-mode probes (or just a better blacklist?)
Organization: IBM CSTL
References: <1158683336.10983.27.camel@dragon> <y0mu032d0zd.fsf@ton.toronto.redhat.com> <1158766960.6220.13.camel@dragon> <45116AE2.2060003@us.ibm.com> <4513B0C9.6000905@cn.ibm.com>

Li Guanglei wrote:

Hi, I used: stap -e 'probe kernel.function("*") {}' -p2 -v | grep "kernel.function" | wc -l, and it shows me 10827 functions will be probed.

As suggested, we divide all the functions into groups. The number of group can't be too big since we must the run the test enough long for each group. So there will be quite some functions(~1000 maybe) in each group. How about if one of the groups crashes the kernel? In most cases we can't know which functions cause the problem so we have to shrink the scope by and by to put the functions inside this group gradually into the whitelist, but this will cause a lot of work. A bad situation is that all the groups will crash the kernel.

Apparently those groups that pass the tests can't declare all functions contains inside them are safe. Maybe some functions were never triggered during the tests or only were triggered a few times and didn't came across the dead condition. If one day we find probing the whole whitelist crashes the Kernel, we have to take pains to find out which one in the whitelist has the problem. And found a suitable testcase that will trigger all the probes is a hard task.

So after thinking about this topic, the whole work may not be an easy task. Maybe finally we find we spent too much time to get the whitelist.

Just my random thoughts.

- Guanglei

We could slightly modify "all_kernel_functions.exp" to make it print the statistics of probe being triggered periodically into a local file:

set systemtap_script {
    global stat
    probe %s {
        stat[probefunc()] <<< 1
    }
    probe begin {
        log("systemtap starting probe")
    }
    probe timer.ms(10000), end {
        log("systemtap ending probe")
        foreach (func in stat)
            printf("%%d  %%s\n", @count(stat[func]), func)
    }

}

We could also record which groups has passed test, and which group is being tested. We use an init script to run the testcase right after system is booted up. So each time system booted into the testing, we can resume the tests. And even for those group failed the test, we can refer to the statistics information and consider those events being triggered >> one times could be moved into safe list.

Another machine could just ping the testing machine and if no response, it could just send a command to reboot the testing machine, so that we can run such testing while we are sleeping. :)

- Guanglei

References:
- whitelist for safe-mode probes (or just a better blacklist?)
  - From: Martin Hunt
- Re: whitelist for safe-mode probes (or just a better blacklist?)
  - From: Frank Ch. Eigler
- Re: whitelist for safe-mode probes (or just a better blacklist?)
  - From: Martin Hunt
- Re: whitelist for safe-mode probes (or just a better blacklist?)
  - From: Vara Prasad
- Re: whitelist for safe-mode probes (or just a better blacklist?)
  - From: Li Guanglei

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]