This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
systemtap PR10171, itrace crashes
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- To: utrace-devel at redhat dot com, systemtap at sources dot redhat dot com
- Date: Sat, 21 Nov 2009 14:47:47 -0500
- Subject: systemtap PR10171, itrace crashes
Hi -
itrace is a utrace engine included in the systemtap runtime that
activates UTRACE_SINGLESTEP / BLOCKSTEP on targeted userspace
processes. It seems to work OK on ppc64, but crashes lots on recent
fedora (inc. rawhide) level utrace on x86/x86-64. (Lately I've been
unable to boot a utrace/master branch version of the kernel to try it
against.)
The systemtap code is at git://sourceware.org/git/systemtap.git,
master branch for the near-original code, pr10653-itrace-crashes
branch for a partial rewrite. The key file is runtime/itrace.c.
tapsets-itrace.cxx generates code that calls into this for new or
dying threads. The crash is now intermittent (an "improvement"
over a near-100% crash on rawhide x86 machines), and is triggered
with a systemtap testsuite run thusly:
% configure ... ; make; make install # build systemtap
% sudo make installcheck RUNTESTFLAGS=itrace.exp
While hacking up my near-rewritten form of runtime/itrace.c, I came
across several cases where utrace made calls on our engine seemingly
long after a UTRACE_DETACH was to take effect. This included cases
where a signal or other reporting callback had already returned
UTRACE_DETACH, or after a debugging task seemingly completed a
utrace_control(DETACH).
So I added hack after hack, using engine->data as an indication that
someone else has already requested a DETACH, and to keep echoing that
back to utrace until it "takes". (Maybe it's a bug in x86
single-stepping logic that didn't appear on ppc. Dunno. We've also
had to disable itrace on ia64 for suspicious crashes that may also be
the same problem. sourceware.org/PR10171 and /PR10653.)
Could you guys take a look at old/new runtime/itrace.c, to see what
the code's trying to do with utrace? Do you have a
utrace/master-level kernel handy to try it on? I don't understand why
any of these hacks are supposed to be necessary.
A typical kernel crash log follows (3-cpu virtual machine). Sometimes
this triggers somewhat *after* the systemtap session has completed, up
to and including unloading of the runtime/itrace.c-containing module.
[ 634.676957] stap_198c84f334a07f5b8157be1fd1be6bec_1461: systemtap: 1.0/0.143, base: fa97b000, memory: 16305+20172+1344+13600 data+text+ctx+net, probes: 5
[ 634.926718]
[ 634.926719] =============================================
[ 634.927113] [ INFO: possible recursive locking detected ]
[ 634.927113] 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 634.927113] ---------------------------------------------
[ 634.927113] ls_5066/5705 is trying to acquire lock:
[ 634.927113] (&sighand->siglock){......}, at: [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[ 634.927113]
[ 634.927113] but task is already holding lock:
[ 634.927113] (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d
[ 634.927113]
[ 634.927113] other info that might help us debug this:
[ 634.927113] 1 lock held by ls_5066/5705:
[ 634.927113] #0: (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d
[ 634.927113]
[ 634.927113] stack backtrace:
[ 634.927113] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 634.927113] Call Trace:
[ 634.927113] [<c07b3339>] ? printk+0x14/0x1b
[ 634.927113] [<c04690df>] __lock_acquire+0x833/0xb6f
[ 634.927113] [<c048407d>] ? utrace_get_signal+0x50b/0x53d
[ 634.927113] [<c04694ae>] lock_acquire+0x93/0xb1
[ 634.927113] [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344
[ 634.927113] [<c07b578b>] _spin_lock_irq+0x2e/0x5e
[ 634.927113] [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344
[ 634.927113] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[ 634.927113] [<c04028dc>] do_signal+0x5f/0x703
[ 634.927113] [<c044e0b3>] ? force_sig_info+0xa2/0xac
[ 634.927113] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[ 634.927113] [<c040acee>] ? send_sigtrap+0x6d/0x74
[ 634.927113] [<c0402fa7>] do_notify_resume+0x27/0x77
[ 634.927113] [<c0403718>] work_notifysig+0x13/0x1b
[ 639.001030] BUG: spinlock lockup on CPU#3, ls_5066/5705, ec094704 (Not tainted)
[ 818.430111] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 818.433097] Call Trace:
[ 818.435096] [<c07b3339>] ? printk+0x14/0x1b
[ 818.437097] [<c05d451a>] _raw_spin_lock+0x102/0x128
[ 818.439096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 818.441097] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[ 818.541097] [<c04028dc>] do_signal+0x5f/0x703
[ 818.543096] [<c044e0b3>] ? force_sig_info+0xa2/0xac
[ 818.545097] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[ 818.547096] [<c040acee>] ? send_sigtrap+0x6d/0x74
[ 818.549097] [<c0402fa7>] do_notify_resume+0x27/0x77
[ 818.551312] [<c0403718>] work_notifysig+0x13/0x1b
[ 818.553097] sending NMI to all CPUs:
[ 818.555096] NMI backtrace for cpu 3
[ 818.555096]
[ 818.555096] Pid: 5705, comm: ls_5066 Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1)
[ 818.555096] EIP: 0060:[<c0418cb1>] EFLAGS: 00210046 CPU: 3
[ 818.555096] EIP is at __default_send_IPI_dest_field+0x5f/0x64
[ 818.555096] EAX: 00000c00 EBX: 0f000000 ECX: 00000800 EDX: fffff000
[ 818.555096] ESI: 00000002 EDI: 00000c00 EBP: eb601e08 ESP: eb601dfc
[ 818.555096] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 818.555096] CR0: 8005003b CR2: 00ac3eac CR3: 2a44d000 CR4: 000006d0
[ 818.555096] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 818.555096] DR6: ffff4ff0 DR7: 00000400
[ 818.555096] Call Trace:
[ 818.555096] [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac
[ 818.555096] [<c0418ad3>] default_send_IPI_all+0x27/0x68
[ 818.555096] [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54
[ 818.555096] [<c05d451f>] _raw_spin_lock+0x107/0x128
[ 818.555096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 818.555096] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[ 818.555096] [<c04028dc>] do_signal+0x5f/0x703
[ 818.555096] [<c044e0b3>] ? force_sig_info+0xa2/0xac
[ 818.555096] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[ 818.555096] [<c040acee>] ? send_sigtrap+0x6d/0x74
[ 818.555096] [<c0402fa7>] do_notify_resume+0x27/0x77
[ 818.555096] [<c0403718>] work_notifysig+0x13/0x1b
[ 818.555096] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 818.555096] Call Trace:
[ 818.555096] [<c0402656>] ? show_regs+0x24/0x2a
[ 818.555096] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[ 818.555096] [<c07b619f>] do_nmi+0xa8/0x289
[ 818.555096] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[ 818.555096] [<c041007b>] ? do_machine_check+0x683/0x6ec
[ 818.555096] [<c04100e0>] ? do_machine_check+0x6e8/0x6ec
[ 818.555096] [<c0418cb1>] ? __default_send_IPI_dest_field+0x5f/0x64
[ 818.555096] [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac
[ 818.555096] [<c0418ad3>] default_send_IPI_all+0x27/0x68
[ 818.555096] [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54
[ 818.555096] [<c05d451f>] _raw_spin_lock+0x107/0x128
[ 818.555096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 818.555096] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[ 818.555096] [<c04028dc>] do_signal+0x5f/0x703
[ 818.555096] [<c044e0b3>] ? force_sig_info+0xa2/0xac
[ 818.555096] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[ 818.555096] [<c040acee>] ? send_sigtrap+0x6d/0x74
[ 818.555096] [<c0402fa7>] do_notify_resume+0x27/0x77
[ 818.555096] [<c0403718>] work_notifysig+0x13/0x1b
[ 818.556078] NMI backtrace for cpu 2
[ 818.556078]
[ 818.556078] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1)
[ 818.556078] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 2
[ 818.556078] EIP is at native_safe_halt+0xa/0xc
[ 818.556078] EAX: c0409645 EBX: c0a3301c ECX: f6d02ba0 EDX: 00000000
[ 818.556078] ESI: 00000002 EDI: 00000000 EBP: f6cfff78 ESP: f6cfff78
[ 818.556078] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 818.556078] CR0: 8005003b CR2: b76be000 CR3: 27f2e000 CR4: 000006d0
[ 818.556078] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 818.556078] DR6: ffff0ff0 DR7: 00000400
[ 818.556078] Call Trace:
[ 818.556078] [<c040964a>] default_idle+0x44/0x6e
[ 818.556078] [<c04026f7>] cpu_idle+0x9b/0xb4
[ 818.556078] [<c07afcd0>] start_secondary+0x204/0x242
[ 818.556078] Pid: 0, comm: swapper Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 818.556078] Call Trace:
[ 818.556078] [<c0402656>] ? show_regs+0x24/0x2a
[ 818.556078] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[ 818.556078] [<c07b619f>] do_nmi+0xa8/0x289
[ 818.556078] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[ 818.556078] [<c0409645>] ? default_idle+0x3f/0x6e
[ 818.556078] [<c0409645>] ? default_idle+0x3f/0x6e
[ 818.556078] [<c04201dc>] ? native_safe_halt+0xa/0xc
[ 818.556078] [<c040964a>] default_idle+0x44/0x6e
[ 818.556078] [<c04026f7>] cpu_idle+0x9b/0xb4
[ 818.556078] [<c07afcd0>] start_secondary+0x204/0x242
[ 634.927053] NMI backtrace for cpu 1
[ 634.927053]
[ 634.927053] Pid: 5698, comm: stapio Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1)
[ 634.927053] EIP: 0060:[<c05d0ad5>] EFLAGS: 00200096 CPU: 1
[ 634.927053] EIP is at delay_tsc+0x1f/0x66
[ 634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e
[ 634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c
[ 634.927053] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0
[ 634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 634.927053] DR6: ffff0ff0 DR7: 00000400
[ 634.927053] Call Trace:
[ 634.927053] [<c05d0a58>] __delay+0xe/0x10
[ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75
[ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8
[ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16
[ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4
[ 634.927053] [<c04e5b71>] sys_write+0x40/0x62
[ 634.927053] [<c0403610>] syscall_call+0x7/0xb
[ 634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e
[ 634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c
[ 634.927053] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0
[ 634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 634.927053] DR6: ffff0ff0 DR7: 00000400
[ 634.927053] Call Trace:
[ 634.927053] [<c05d0a58>] __delay+0xe/0x10
[ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75
[ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8
[ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16
[ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4
[ 634.927053] [<c04e5b71>] sys_write+0x40/0x62
[ 634.927053] [<c0403610>] syscall_call+0x7/0xb
[ 634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[ 634.927053] Call Trace:
[ 634.927053] [<c0402656>] ? show_regs+0x24/0x2a
[ 634.927053] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[ 634.927053] [<c07b619f>] do_nmi+0xa8/0x289
[ 634.927053] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[ 634.927053] [<c046007b>] ? ktime_get_ts+0x33/0xa2
[ 634.927053] [<c05d0ad5>] ? delay_tsc+0x1f/0x66
[ 634.927053] [<c05d0a58>] __delay+0xe/0x10
[ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75
[ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8
[ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16
[ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4
[ 634.927053] [<c04e5b71>] sys_write+0x40/0x62
[ 634.927053] [<c0403610>] syscall_call+0x7/0xb
[ 819.261347] NMI backtrace for cpu 0
[ 819.261347]
[ 819.261347] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1)
[ 819.261347] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 0
[ 819.261347] EIP is at native_safe_hal