This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: return probe not executed on SMP system


Hi,
After looking deep into this issue I found out the problem is in the stpd. The sptd unloads the systemtap module a little too early before the return probes have a chance to fire their handlers.


If you have the systemtap src tree try this tempory_fix.patch. I will file a bugzilla tomorrow.

Hien.



Guang Lei Li wrote:

Hi,

I met some difficulties when dealing with the return probe on a multi-processor system(Power5 System, 4 CPU).

This is the stap script I used:

global counter

function info()
%{
struct task_struct *cur = current;
_stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid, cur->thread_info->cpu);
%}


probe kernel.function("sys_read")
{
 if(pid() == target())
 {
   counter--
   info()
   log("pid:".string(pid())." target:".string(target())."entry")
 }
}

probe kernel.function("sys_read").return
{
 if(pid() == target())
 {
   counter++
   info()
   log("pid:".string(pid())." target:".string(target())."return")
 }
}

probe begin
{
 counter=100
}

probe end
{
 log("counter: ".string(counter))
}

then I run:
 stap -g a.stp -c "ls > a"

The output:

root:/root/temp>stap -g b.stp -c "ls > a"

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry
counter: 95

It seemed that the return probe didn't work for me.
I tried the same script on a uni-processor x86 system, it worked fine.

And I also tried to write a simple c application which will open a file, and read some data from this file. I run it:
stap -g b.stp -c "./a.out"
It gave the output like:


...
|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry
....

|3881|3881|3|0|pid:3881 target:3881entry

|3881|3881|3|0|pid:3881 target:3881return
counter: 33

You can see that there are still some return probes not be executed at all(if all are executed, the counter should be 100).

Could anybody give me a hint about this problem?

Best Regards,

Li Guanglei




--- src.old/runtime/stpd/librelay.c	2005-10-19 12:35:35.000000000 -0700
+++ src-20051029/runtime/stpd/librelay.c	2005-11-03 17:06:51.000000000 -0800
@@ -729,11 +729,16 @@
 		case STP_START: 
 		{
 			struct transport_start *t = (struct transport_start *)data;
+			unsigned int mywait= 0xffffffff;
 			dbug("probe_start() returned %d\n", t->pid);
+
 			if (t->pid < 0)
 				cleanup_and_exit(0);
 			else if (target_cmd)
 				kill (target_pid, SIGUSR1);
+				while(mywait> 0) {
+					mywait--;
+				}
 			break;
 		}
 		default:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]