This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Forw: [gmane.linux.kernel.utrace] utrace plans

From: fche at redhat dot com (Frank Ch. Eigler)
To: systemtap at sources dot redhat dot com
Date: Wed, 14 May 2008 14:27:43 -0400
Subject: Forw: [gmane.linux.kernel.utrace] utrace plans

This will have implications for utrace-event probes and uprobes.

--- Begin Message ---

From: Roland McGrath <roland at redhat dot com>
To: utrace-devel at redhat dot com
Date: Tue, 13 May 2008 21:29:04 -0700 (PDT)
Subject: utrace plans
Approved: news@gmane.org
Envelope-to: glku-utrace-devel@gmane.org
Newsgroups: gmane.linux.kernel.utrace
Original-received: from hormel1.redhat.com ([209.132.177.33] helo=hormel.redhat.com)by lo.gmane.org with esmtp (Exim 4.50)id 1Jw8dH-0000Pl-F4for glku-utrace-devel@gmane.org; Wed, 14 May 2008 06:30:28 +0200
Original-received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110])by hormel.redhat.com (Postfix) with ESMTP id E96548E005D;Wed, 14 May 2008 00:29:41 -0400 (EDT)
Original-received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com[172.16.52.254])by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP idm4E4Te9c022727 for <utrace-devel@listman.util.phx.redhat.com>;Wed, 14 May 2008 00:29:40 -0400
Original-received: from gateway.sf.frob.com (vpn-14-154.rdu.redhat.com [10.11.14.154])by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP idm4E4TbXG003077; Wed, 14 May 2008 00:29:38 -0400
Original-received: from magilla.localdomain (magilla.sf.frob.com [198.49.250.228])by gateway.sf.frob.com (Postfix) with ESMTPid 0F43F357B; Tue, 13 May 2008 21:29:34 -0700 (PDT)
Original-received: by magilla.localdomain (Postfix, from userid 5281)id 44B8F26F8D4; Tue, 13 May 2008 21:29:04 -0700 (PDT)
Original-x-from: utrace-devel-bounces@redhat.com Wed May 14 06:30:35 2008
Xref: news.gmane.org gmane.linux.kernel.utrace:237

I'm going to work on revamping the utrace interface a fair bit before we
try to submit upstream again.  Everything here is subject to change and I
welcome all feedback.  Many details remain to be ironed out.

I'm trying to address (at least) these items from the TODO list (and other
past experience):

* soft quiescence
* siglock contention
* utrace_inject_signal vs multiple engines

One key reorganization is at the heart of handling all those issues.
That is, we'll change the nature of quiescence in the interface and get
rid of the "action state" flags.  What this means in specific:

* NOREAP is gone.  This was only intended to be used by ptrace; new
  ptrace integration will use its own internal special cases for this.
  We don't intend to offer a feature for interfering with the normal
  parent SIGCHLD/wait behavior.

* single/block step is resume action only, not persistent state

* UTRACE_ACTION_QUIESCE is replaced by UTRACE_STOP, see below

* utrace_set_flags becomes utrace_set_events, flags is pure event bitmask

* utrace_inject_signal is gone.
  It's replaced by the new use of report_signal, see below.

Before I get into the details of the new interface ideas, the other
major thing to mention about the new implementation tack is ptrace.
In contrast to the past work, I am not ripping out ptrace and redoing
it purely on top of utrace.  We'll take a piecemeal approach.  I think
this will be the best way to get upstream buy-in for utrace as an
experimental config option and start get some merging to happen
everything is cooked.

What I have right now is a version where utrace is enabled alongside
the old ptrace implementation.  The hooks into core code are cleaned
up via tracehook.h, but ptrace still works the same old way.  With
this, they both work, but they don't play well together on the same
thread (one doesn't understand when the other is stopped, etc).  After
getting the revamped utrace interface on its feet, the next job will
be to find an uncomplicated way to make ptrace use utrace (or
cooperate well with utrace) for its thread stops.

As an incidental interface change, we'll nix the VFORK_DONE event.
This was only there because ptrace has it.  You can just catch CLONE
events with the CLONE_VFORK flag, and then wait for the SYSCALL_EXIT
or QUIESCE event afterwards.


Now, the crux of the new interface style.  As I said above,
utrace_set_flags will now be utrace_set_events and only affect the set
of events you want to make callbacks.  The effect you got by using
UTRACE_ACTION_QUIESCE in utrace_set_flags will now come via the new
call utrace_control (might get a better name).  utrace_control takes
task, engine, and:

	/*
	 * The order of these is important.  When there is more than one engine,
	 * each supplies its choice and the smallest value prevails.
	 */
	enum utrace_resume_action {
		UTRACE_STOP,
		UTRACE_REPORT,
		UTRACE_INTERRUPT,
		UTRACE_SINGLESTEP,
		UTRACE_BLOCKSTEP,
		UTRACE_RESUME,
		UTRACE_DETACH,

This same enum is encoded in the return value from event reports.  In
either utrace_control (asynchronous calls from another thread) or as
part of a callback return value, the options mean the same.

* UTRACE_STOP

This is the only kind of persistent state an engine can give a thread
in the new plan.  It requests the thread stop in TASK_TRACED and stay
that way until your engine no longer wants it stopped (and no other
engine does either) or it gets SIGKILL (or equivalent).  We used to
call this quiescence (to be precise, soft quiescence).  By itself, an
asynchronous UTRACE_STOP does not do anything to syscalls.

When you use utrace_control or return from a callback using any
utrace_resume_action other than UTRACE_STOP, then your engine no
longer wants the thread stopped.  If no engines want it stopped,
it will wake up and run or take other events.

When you call utrace_control with UTRACE_STOP, it gives a return value
to indicate whether the thread is already stopped.  If it's not
already stopped, then the effect is similar to UTRACE_REPORT.  Your
report_quiesce callback will be called soon.  Its return value can use
UTRACE_STOP to keep the thread stopped until you use utrace_control,
or your callback can do what it needs to do and let the thread resume.

* UTRACE_REPORT

This is the softest quiescence: it just requests that the thread get
into a place to make a report_quiesce ASAP.  It interrupts user-mode
running on another CPU, but does not affect syscalls in progress.
(It does not cause signal_pending() to become true.)

If this is used in a report_signal callback's return value, or in the
last report_quiesce before returning to user mode, it just becomes
UTRACE_RESUME (we won't loop doing report_quiesce callbacks).

* UTRACE_INTERRUPT

This asks for old-fashioned "hard" quiescence: that is, it interrupts
syscalls or other work in progress just like receiving a SIGSTOP would.
(It causes signal_pending() to become true.)  When any pending syscall
has finished being interrupted, then the thread will make a
report_signal or report_quiesce callback.

* UTRACE_SINGLESTEP, UTRACE_BLOCKSTEP

You want to resume and step.  You can use this in the return value from
any event callback, but that doesn't mean it necessarily happens.  If
another engine kept it stopped, or another event happened before it got
back to resuming in user mode, then it's only the (smallest) resume
action of the final report_signal/report_quiesce that takes effect.
So, you need to be getting UTRACE_EVENT(QUIESCE) and have that callback
return UTRACE_SINGLESTEP to be sure.

Note, if you used BLOCKSTEP but another engine used SINGLESTEP, then
that means it does single-step.

* UTRACE_RESUME

You want to let the thread run normally.  This should be part of every
callback return value when you aren't doing anything else.

* UTRACE_DETACH

You want to detach your engine.  (This implicitly lets the thread run
if it was stopped.)  The only reason utrace_detach got replaced with
utrace_control(UTRACE_DETACH) is because the utrace_control argument is
naturally the same as the utrace_resume_action in callback return values.


Now about the callbacks.  Every callback will have an "action"
argument, similar to what report_signal had before.  In all callbacks,
some bits of this will be a utrace_resume_action value.  This gives the
prevailing choice of other engines so far.  Your engine can use this in
deciding what value to return.  Your engine can't override other
engines' choices as such.  The smallest-value action (in the order
above) will prevail.

In report_signal (and maybe later other callbacks), other bits in that
argument say something specific to that event, like the disposition of
the signal.  That argument is in the same format as the return value.

When your engine requests UTRACE_EVENT(QUIESCE), your report_quiesce
callback will be made before the callback for another event you've
requested.  It gets an argument that gives the UTRACE_EVENT(X) bit
that's the event happening now.  If you hadn't requested it before, you
can use utrace_set_events to enable the report_X callback whose
arguments carry the details of the event; it will be made right after
your report_quiesce function returns.  The event argument can be 0,
meaning there is nothing special happening but you (or someone) used
UTRACE_REPORT or UTRACE_STOP.

When someone has used UTRACE_INTERRUPT, or we've gotten into
get_signal_to_deliver() for whatever reason, things change a little.  If
you have requested UTRACE_EVENT(QUIESCE) and your ops->report_signal is
!= NULL, then you'll get a report_signal callback even if you have not
requested any of the UTRACE_EVENT_SIGNAL_ALL events.  This is in lieu of
a report_quiesce with a signal event bit, you just get this special
report_signal callback instead.  This callback is made before dequeuing
any signal, and starts with a special "NONE" disposition.  This gives
engines an opportunity to inject a signal disposition.  There is no
queuing--if the last engine changes the disposition, its choice wins;
but if any engine is still requesting STOP, then it won't happen.  When
all engines are letting the thread resume after that callback, then it
will dequeue any real signal that's pending and make a report_signal
callback if the right signal event was requested.

This is how we replace utrace_inject_signal.  When an engine in a
different callback, or asynchronously, wants to inject a signal, it must
use UTRACE_INTERRUPT and then use its report_signal callback to deliver
the details it wants.  This keeps us out of the business of queuing and
deciding how multiple engines' injections relate to each other.  It's
reduced to the problem of multiple engines' resolution of a callback,
which is a problem we already have anyway.

The way this scheme addresses both siglock contention and soft
quiescence is the same one: REPORT sets TIF_NOTIFY_RESUME (new) and not
TIF_SIGPENDING.  If there are no signals (and no UTRACE_INTERRUPT), then
we don't take the siglock just to report our quiescence.  We just get to
report_quiesce via tracehook_notify_resume, and the thread can do
whatever it's going to do.  If UTRACE_STOP is returned from a callback
and we need to get all the way to TASK_TRACED, then we will still need
the siglock for that transition.  Compared to the fact that they're
blocking anyway, contention from that doesn't seem like such an issue.
But we can look into the siglock situation for stopping if it really is
a problem in the future.


I think that covers the major ideas.  As I said, all subject to change.
Several details remained to be hashed out clearly, especially some about
the report_signal callback.  I'm going to be working in the next few days
on getting something going according to this rough plan.  Please bring up
anything you think important.


Thanks,
Roland

--- End Message ---

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]