This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Fw: Introducing power performance counter contacts


See below: 

> -----Original Message-----
> From: Maynard Johnson [mailto:maynardj@us.ibm.com] 
> Sent: Thursday, June 30, 2005 3:21 PM
> To: Chen, Brad; Spirakis, Charles; wcohen@redhat.com; David 
> Gibson; prasadav@us.ibm.com
> Cc: systemtap@sources.redhat.com; gsmith@us.ibm.com; scheel@us.ibm.com
> Subject: Re: Fw: Introducing power performance counter contacts
> 
> Maynard Johnson wrote:
> I've read through the SystemTap documentation a couple of 
> times, trying to absorb the concepts as best I can.  I have a 
> few thoughts.  Go ahead and throw fireballs back at me if I 
> misconstrue anything -- which I'm sure I will since I've 
> probably just skimmed the surface of what you folks have been doing.
> 

No fireballs. If there is a problem, the earlier we spot it, the better.

> The ProfileTapset.txt file was of great interest to me as it 
> seems the intent is to provide a generalized, cross-platform 
> mechanism for interfacing with arbitrary PMUs that could be 
> used by various existing (and future) performance tools.  
> However, it seems to me there's a fundamental mismatch 
> between SystemTap and performance tools; i.e., SystemTap 
> seems geared towards more of an adhoc usage model whereas
> (existing) performance tools API requirements are well-known.

I don't think I understand why you believe there is a mismatch here. In
both the systemtap and profiling world, the goal is to collect data
regarding what is happening on a machine. The mechanism for causing the
"collection point" may be different (synchronous vs. asynchronous in the
systemtap parlance), but the general concept (collect data, aggregate
it, pass it back to the user, etc) is a common theme. Further, part of
the systemtap goal is to be usable in a production system with low
overhead. A goal that is also shared with performance tools in general.
Please help me understand why you think systemtap and profiling are
fundamentally different.

> 
> In one section of the ProfileTapset document, there was 
> mention of the current activity involving a possible merge of 
> Perfmon and PerfCtr with the intent of getting a generalized 
> PMU interface into the mainline kernel.  It's not clear to me 
> whether this SystemTap/Profiling project would somehow build 
> on that interface or replace it.  Could someone clarify the intent?
> 

The systemtap/profiling project would build upon it, not replace it. The
idea is that there is a general need for a profiling infrastructure (aka
service) which would handle certain things for all clients (for example,
arbitration of resources, virtualization, etc). These needs tend to be
universal for any tool that does profiling. If the need is universal,
why not have all the clients use the same infrastructure? Whether that's
PerfCtr, perfmon or some combination of the two doesn't really matter.
What we need to avoid is fragmenting the tool base (for example, we
don't need two different entities handling pmu arbitration because that
is the same as no one doing arbitration). The kernel doesn't have
multiple different system calls to open a file, it has a common
infrastructure in which other tools can depend. Access to the PMU should
be treated the same way.


> In the same section of the document, it's stated that "both 
> pmu libraries [perfmon and perfctr] are geared more for 
> user->kernel access rather than kernel->kernel access and we 
> will need to see what can be EXPORT()'ed to make it more 
> kernel module friendly".  Does this imply that all the 
> knowledge of how to program the PMUs would be pushed down 
> into the kernel with a SystemTap solution?  If so, there 
> would need to be a way for performance tools to obtain that 
> knowledge to present to the user all of the arch-specific 
> events and profiling options.

I don't believe we want the knowledge of events in the kernel, however,
realistically, profiling comes down to three things:

1) Configuration
2) Handling the interrupt/collecting data
3) Providing that data to the user

The systemtap runtime already handles (3) and any work to make that
better suited for profiling helps systemtap as well.

Systemtap as a whole needs to handle "at this point in time, collect
some data". Whether that "at this point in time" was caused by an
interrupt (2) or a branch, the behavior afterwards is the same and
covered by the systemtap language. The time spent in the "collect some
data" (aka probe handler aka interrupt handler) is determined by what
you want to do and what overhead you are willing to accept. The usage
model section was an attempt to explain what profiling people tend to
want to do in their handler. There are comments in other sections that
point out that how that is implemented in systemtap will affect how well
profiling in systemtap meets the goal of low overhead/low impact on the
target environment. Once again, though, the interests are aligned -
anything done to make systemtap more friendly for profiling helps both
systemtap in general and profiling specifically.

Which leaves us with configuration (1).

In the end, programming a PMU is basically done by setting up a table of
register/value pairs and then shoving those register/value pairs into
the PMU. Systemtap has a "compiler" which runs in user space to
translate the "systemtap language" into C code which eventually becomes
the loadable kernel module. There is no reason why the translation from
"use event X on processor Y" into a table of register value pairs can't
be handled by the systemtap compiler. Whether that's a separate library
the compiler uses or whether that's arch specific source files within
the compiler is a design choice, but the basic premise still holds.

For the moment, assume an arch translation library. What would be needed
then (to make the compiler's life easier) is a well documented API for
conversion - an API that is potentially as simple as, "I pass you a
string, you pass me back an array of reg/value pairs for programming".
As to what events are supported and how complicated the profiling
options are (at least for triggering a PMU interrupt), that work can be
contained in the library and expanded over time - systemtap as a whole
is insulated from the details.

Thus, the point in the document was that the actual calls to
"start/stop/read/write" the PMU are all (currently) designed to be
called from user space for both perfctr and perfmon. A loadable kernel
module which has a table of register/value pairs could use those same
facilities if they existed in an EXPORT'ed form.

> 
> I'm sure I'll have more questions as I digest the information 
> and get more details from you all.
> 
> Thanks.
> Maynard
> 
> >
> >Maynard Johnson
> >LTC Power Linux Toolchain
> >507-253-2650
> >
> >----- Forwarded by Maynard Johnson/Rochester/IBM on 
> 06/30/2005 04:32 PM
> >-----
> >                                                             
>               
> >             prasadav@us.ltcfw                               
>               
> >             d.linux.ibm.com                                 
>               
> >                                                             
>            To 
> >             06/30/2005 10:23          "Chen, Brad" 
> <brad.chen@intel.com>, 
> >             AM                        "Spirakis, Charles"   
>               
> >                                       
> <charles.spirakis@intel.com>,       
> >                                       William Cohen 
> <wcohen@redhat.com>,  
> >                                       dwg@au1.ibm.com, 
> Maynard            
> >                                       
> Johnson/Rochester/IBM@IBMUS,        
> >                                       Jeffrey 
> Scheel/Rochester/IBM@IBMUS, 
> >                                       Bill 
> Buros/Austin/IBM@IBMUS         
> >                                                             
>            cc 
> >                                       Geoff 
> Smith/Beaverton/IBM@IBMUS,    
> >                                       SystemTAP             
>               
> >                                       
> <systemtap@sources.redhat.com>      
> >                                                             
>       Subject 
> >                                       Introducing power 
> performance       
> >                                       counter contacts      
>               
> >                                                             
>               
> >                                                             
>               
> >                                                             
>               
> >                                                             
>               
> >                                                             
>               
> >                                                             
>               
> >
> >
> >
> >
> >I would like to Introduce
> >Maynard Johnson:  He is the contact for PAPI, his area of 
> expertise is 
> >mostly in the user space.
> >David Gibson:  He is the main IBM contact for perfctr.
> >Bill Buros: Lead for the performance team on Power platform.
> >Jeff Scheel: He is the overall architect for Linux On Power.
> >
> >For the benefit of David/Maynard/Bill and Jeff, i will 
> introduce others 
> >in the To list
> >
> >Will Cohen: He is the maintainer of Oprofile and he is 
> heavily involved 
> >in SystemTAP.
> >
> >Brad Chen: He is the overall lead from Intel in the 
> SystemTAP project.
> >
> >Charles Spirakis: Charles is looking at the performance 
> counter issue 
> >along with Will in the systemtap project, charles has 
> experience with 
> >vtune project.
> >
> >SystemTAP home page can be found at http://sourceware.org/systemtap/ 
> >The home page has links to the mailing lists and all the 
> other project 
> >related documents.
> >
> >Please let me know if i can be of any more help for all of 
> you to make 
> >progress on the performance counters work.
> >
> >Thanks,
> >Vara Prasad
> >
> >
> >
> >  
> >
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]