This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Proposed systemtap access to perfmon hardware


William Cohen wrote:
Maynard Johnson wrote:

William Cohen wrote:

[snip]

perfmon_create_context:long ()

The perfmon_create_context command sets up the performance monitoring
hardware for the allocated contexts and starts the counters running.
If successful, the function will return zero. If the operation is
unsuccessful because an error code will be returned. This function
should only be used in probe begin. (FIXME list error code returned.)


I'm confused about the relationship between this function and perfmon_start_counter, since starting the counters is mentioned in both. Could you explain at what point this function is invoked and what the purpose of the context is? I'm not real familiar with the perfmon2 interface, but just on the face of it, your context doesn't seem like a one-to-one fit with the way contexts are used in perfmon2. In perfmon2, a context is created first, which is then passed in to the calls for setting up events, thereby associating those events with the context. Then 'start' uses the context to set up the PMU for all requested events and begin the counting.


Yes, perfmon2 has a contexts that sets all the performance monitoring hardware registers. The perfmon2 start and stop control the entire context.

Based on the feedback from earlier proposal email, revised to using something like:

probe perfmon.event("blah") ...

All the probes using the perfmon hardware would be collected together for the perfmon_create_context.
This is good.
The individual start and stop operations would be allowed.
This is not so good. Besides the fact that it may be difficult (or impossible) to do, I don't see it being all that useful. But then, I'm a tool developer, not a performance analyst, so I could be missing the point.

> It is and open question what the counters default are;
do they start running by default or have to be explicitly started. If they are started by default, where exactly are they running? Beginning of begin probe? End of begin probe?


[snip]


perfmon_start_counter:long (event_handle:long)

The event_handle passed in indicates which counter to start. The value
is returned as a 64-bit long of the current counter value. The return
value is undefined for an invalid event_handle.


I think individually starting counters is problematic at a couple different levels. On some architectures (like PowerPC64), you don't have fine-grained control over each counter. Also, one usually wants all counters to begin counting at the same time. Maybe I'm misinterpreting what the intention of this function is.


I was thinking there are cases where one would want to start and stop individual sampling and interval counting. Yes, starting and stoping counters on some architectures can be a problem. I was thinking if cheating and not actually starting and stopping the counters, but rather turning on and off the bits that enabling counting in user and kernel space. Do this by finding which bits to twiddle in the control register.
Unfortunately, this isn't possible for ppc64. The control bits you mention (for user/kernel domain) are used for all counters, so there's no fine-grained control there. There are PMCxSEL bits for setting up each counter for what you want it to count (including "count nothing"), but changing these on the fly (i.e., without disabling the PMU) may not have the desired effect. The documentation states that you should first disable the PMU before you change these bits, but it doesn't say what would happen if you didn't disable.

-Maynard
However, maybe this won't work for ppc64. I will have to review the ppc64 hardware manual to see that this scheme would work.

[snip]


EVENT SPECIFICATION

The performance monitoring events are specified in strings. The
information at the very least include the event name being monitored


Will, you allude to this in a later posting, but I'll reiterate here. Should the event name be the native event name for the arch? Or some generic name that is mapped to a native name by some mechanism? Or either (as in PAPI)?


libpfm has some generic names for cycle counts. I expect that events will be both generic names and architecture specific. This will be a lookup in libpfm.

by the counter.  Additional information would include a event mask to
specify subevents, whether to count in kernel or user space, whether
to keep track of counts on a per thread or per CPU basis, and the
interval for the sampling.

(FIXME more detail on the string compositions)


SYSTEMTAP PERFORMANCE HARDWARE ACCESS IMPLEMENTATION


The SystemTap access performance monitoring hardware is planned to be
built on the perfmon2 kernel support. The perfom2 provides reservation
and access to the performance monitoring hardware on ia64, i386, and
PowerPC processors. The perfmon2 support is not yet in the upstream
kernels, but patches are available.


As a proof of concept, I agree that this is the best route. Reinventing the wheel would be useless. Maybe building this prototype might help with refining the perfmon2 interface.


I have been working on patching oprofile so that it uses the perfmon2 interface. The work is being done on an amd64 machine. This should allow some examination of the mechanisms for setting up the events and sampling. It should be portable to perfmon2 for i386, ppc64, and ia64. I will make the patches available for comment.

Next step would be to protoype similar opertation for systemtap.

I am trying to avoid reinventing the wheel. I am also very concerned that raw access of the performance monitoring hardware will further increase the chances of multiple device drivers stepping on each other without knowing about it.

-Will



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]