This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: proposed perf.counter enhancement

From: fche at redhat dot com (Frank Ch. Eigler)
To: Stan Cox <scox at redhat dot com>
Cc: systemtap at sourceware dot org
Date: Mon, 19 Nov 2012 14:16:07 -0500
Subject: Re: proposed perf.counter enhancement
References: <50859301.6060409@redhat.com> <y0mobjs7w2m.fsf@fche.csb> <50941763.5020302@redhat.com> <50A9BCF0.7010102@redhat.com>

scox wrote:

> global A
>
> probe perf.type(0).config(0).thread("z") {
>   # this probe, even if it is never hit, sets up the counter "z"
> }

(If you forgive the bikeshedding, I find .thread("z") a little too
opaque.  How about counter("z") or access("z")?)


> probe process("./bench.x").function("main")
> {
>   printf("In %s\n",pp())
>   A <<< @perf("z")
> }
> probe end
> {
>   print (@hist_log(A))
> }

OK, that syntax looks fine to several folks, let's go with it.


I have a whole bunch of comments on the prototype, which will require
some nontrivial reworking (sorry!).  Good job so far, looking forward to
the next version!


>  void
> +typeresolution_info::visit_perf_op (perf_op* e)
> +{
> +  e->type = pe_long;
> +  //  throw semantic_error(_("unexpected @perf"), e->tok);
> +}

(Please toss the // throw and such.)



v> [...]
> +  // Index of desired perf counter, -1 if none
> +  std::vector<long> perf_counter_idx;
> +  // int perf_counter_idx;

Please explain the semantics of the vector index & value.


> +// Parse a @perf().  Given head token has already been consumed.
> +expression* parser::parse_perf_op (const token* t)
> +{
> +  perf_op* pop = new perf_op;
> +  pop->tok = t;
> +  expect_op("(");
> +  pop->operand = parse_expression ();
> +  expect_op(")");
> +  return pop;
> +}

parse_expression() is way too broad.  Try parse_literal() with a
stringness check, or actually forking it into
parse_literal_string()/_number().



> +static long _stp_perf_init (struct stap_perf_probe *stp, struct task_struct* task)
>  {
>  	int cpu;
> +	struct task_struct * perf_task;
> +	
> +	if (stp->attr.sample_period == 0 && task == 0)
> +	  return 0;
> +	if (stp->per_thread && task != 0) {
> +	  if (stp->events != 0) /* already setup */
> +	    return 0;
> +	  else {
> +	    stp->events = _stp_kmalloc(sizeof(struct perf_event*));

Better NULL-check that.


> +	    stp->events[0] = perf_event_create_kernel_counter(&stp->attr,
> +							      -1,
> +							      (struct task_struct *)task,

(Why this cast?)

> [...]
>  		if (IS_ERR(*event)) {
> -			long rc = PTR_ERR(*event);
> -			*event = NULL;
> -			_stp_perf_del(stp);
> -			return rc;
> +		  long rc = PTR_ERR(*event);
> +		  *event = NULL;
> +		  _stp_perf_del(stp);
> +		  return rc;

(Please use diff -w if posting patches.)


> +/*
> +The first call to _stp_perf_init, via systemtap_module_init at runtime, is for
> +setting up aggregate counters.  Per thread counters need to be setup when the
> +thread is known.  This is done by calling _stp_perf_init later when the thread
> +is known.  A per thread perf counter is defined by a "thread("var") suffix on
> +the perf probe.  It is defined by perf_builder.  This counter is read on demand 
> +via the "@perf("var")" builtin which is treated as an expression right hand side
> +which reads the perf counter associated with the previously defined perf
> +counter.  It is expanded by dwarf_var_expanding_visitor
> +*/

Have you considered forking the _stp_perf_init code & data structures for the two
purposes?


> +long _stp_perf_read (int ncpu, unsigned i)
> +{
> +  /* Choose the stap_perf_probes entry */
> +  struct stap_perf_probe* stp = & stap_perf_probes[i];
> +  u64 enabled, running;
> +  struct perf_event *event;

Might want to add a might_sleep() in here to catch problems.

> +  if (stp == NULL)
> +    {
> +      _stp_printf ("perf_event_read_value: stp is null\n");
> +      return 0;
> +    }
> +
> +  if (stp->events == NULL)
> +    {
> +      _stp_printf ("perf_event_read_value: stp->events is null\n");
> +      return 0;
> +    }
> +  else
> +    event = stp->events[0];
> +
> +  if (event == NULL)
> +    {
> +      _stp_printf ("perf_event_read_value: event is null\n");
> +      return 0;
> +    }

If these are cannot-happen type of problems, consider using _stp_error.


> -	/* per-cpu data. allocated with _stp_alloc_percpu() */
> +	/* ! per thread: per-cpu data. allocated with _stp_alloc_percpu() */
> +        /*   per thread: one event allocated with kmalloc */
> +        int per_thread;
>  	struct perf_event **events;

(Or make events[] part of a union?  Certify!  Card-check!  Unite!)


>  
> @@ -26,6 +26,7 @@ struct stap_uprobe_spec {
>    unsigned return_p:1;
>    unsigned long address;
>    unsigned long sdt_sem_offset;
> +  unsigned long perf_counter_idx;
>    struct stap_probe * const probe;
>   };


That's not going to be that simple; there could be more than one
@perf("foo") in a given uprobe.  Perhaps you don't need to store the
index in there at all, and just have the translator emit extra code
into the prologue, repeated for the N @perf("foo")'s.


> +		  for (i=0; i < c->perf_counters_dim; i++) {
> +			  if ((*(c->perf_counters))[i] > -1)
> +			    _stp_perf_read_init ((*(c->perf_counters))[i], task);
> +		        }

(This  * (c->perf_counters) [i]  makes it look like there is perhaps too much indirection
at play here.  Why the "*"?


> --- a/session.h
> +++ b/session.h
> @@ -282,6 +282,7 @@ public:
>    std::vector<stapfile*> files;
>    std::vector<vardecl*> globals;
>    std::map<std::string,functiondecl*> functions;
> +  std::map<std::string,long> perf_counters;

It could be more interesting if instead of using a derivative 'long'
here, there were a pointer to e.g. the struct probe that defined the
counter, for use e.g. in error messages.


> +struct perf_op: public expression
> +{
> +  expression *operand;
> +  void print (std::ostream& o) const;
> +  void visit (visitor* u);
> +};

(As mentioned above, you really want  literal_string *operand here.)



> @@ -501,6 +509,7 @@ struct vardecl: public symboldecl
>    literal *init; // for global scalars only
>    bool synthetic; // for probe locals only, don't init on entry
>    bool wrap;
> +  bool perf;	// Implicit variable for perf counter use
>  };

Are you sure you need a vardecl/array for these @perf("foo")
expressions?  They could be treated as extra fields within the struct
context.


> +  string var;
> +  get_param(parameters, TOK_THREAD, var);
> +  if (var.length() > 0)
> +    {
> +      period = 0;		// perf_event_attr.sample_freq should be 0
> +      int perf_n = 0;
> +      for (unsigned i=0; i<sess.globals.size(); i++)
> +	if (sess.globals[i]->name == var)
> +	  throw parse_error (_("duplicate global name"));
> +        else if (sess.globals[i]->perf)
> +	  perf_n += 1;

Yeah; all this stuff you could make go away if we decide that
the @perf("foo") / probe perf.*.counter("foo") are simply in a totally
separate namespace; don't need to be script-level variables of any sort



> +dwarf_var_expanding_visitor::visit_perf_op (perf_op *e)
> +{
> +  expression *repl = e;
> +  printf ("XXX");
> +  repl->print(cout);
> +  printf ("\n");
> +  token* t = new token;
> +  t->location = e->tok->location;
> +  t->type = tok_identifier;
> +  t->content = ((literal_string*)e->operand)->value;

You won't need that dangerous casting stuff if you make perf_op operands
be literal_strings.


> +  s.perf_counters[c->name] = ((literal_number*)c->init)->value;

(Such casts in general indicate insufficient OO modelling.  They
should not be necessary.)


> [...]
> +  embedded_expr *spr = new embedded_expr;
> +  spr->tok = t;
> +  spr->code = string("_stp_perf_read(smp_processor_id()," + lex_cast(((literal_number*)c->init)->value) + ")");
> [...]

This likely can't work.  _stp_perf_read must be called outside the
probe handler body (which is wrapped by some sort of atomicity
construct).  So instead of generating such code to expand @perf("foo")
to %{ SOMETHING %}, the probe *prologue* will need to have the special
code for it.

That is, the perfctr reading code would need to be inserted within
uprobe probes' common_probe_entryfn_prologue(), before the #if
INTERRUPTIBLE / preempt_disable () / ... stuff.


> +      std::map<std::string,long>::iterator pci;
> +      for (pci = q.dw.sess.perf_counters.begin();
> +	   pci != q.dw.sess.perf_counters.end(); pci++)
> +	{
> +	  this->perf_counter_idx.push_back((*pci).second);
> +	  // this->perf_counter_idx = (*pci).second;
> +	  q.dw.sess.perf_counters.erase(pci);
> +	}

(This is unclear as to purpose.  Perhaps if you switch to a perfctr_name->probe*
mapping instead, as suggested above, it will be more obvious.)


> +      // this->perf_counter_idx = ((struct derived_probe*)(q.base_probe))->perf_counter_idx; // need a way to percolate this forward

(This kind of casting is a warning.)



> diff --git a/testsuite/semok/entry04.stp b/testsuite/semok/entry04.stp
> old mode 100755
> new mode 100644
> diff --git a/testsuite/semok/pretty-uprobes.stp b/testsuite/semok/pretty-uprobes.stp
> old mode 100755
> new mode 100644
> diff --git a/testsuite/semok/thirtysix-utrace.stp b/testsuite/semok/thirtysix-utrace.stp
> old mode 100755
> new mode 100644

(Hm, are those empty?)


- FChE

References:
- proposed perf.counter enhancement
  - From: Stan Cox
- Re: proposed perf.counter enhancement
  - From: Frank Ch. Eigler
- Re: proposed perf.counter enhancement
  - From: Stan Cox
- Re: proposed perf.counter enhancement
  - From: Stan Cox

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]