This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: sort a foreach on a stat value?


Hi -

joshua.i.stone wrote:

> [...]  One thing I've noticed is that our foreach syntax has
> different semantics than other languages [...]

Indeed, just like in awk, we iterate over indexes rather than values.


> [...]
> 	foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> [...]

That sort of thing has some promise at abbreviating that excessive
duplication hunt made an example of in bug #2115.  

While this does not address sorting, another related syntactical
possibility is to infer a "[idx1, idx2]" suffix on undecorated
occurrences of the indexed array within the body of a foreach:

   foreach ([x,y] in thingie)
     total += thingie # implied [x,y]

   foreach ([x,y,z] in mystats)
     printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))

The latter could be abbreviated further to "@count, @sum, @min", to
infer the innermost-looped array itself, plus its index tuple.

A later independent optimization could make sure that the translator
does not emit duplicate array-lookup operations within loops.


> [...]
> >    foreach (tid in stat) // sort by value -> ???
> >      stat_counts[tid] = @count(stat[tid])
> >    foreach (tid in stat_counts-)
> >      printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> >  @avg(stat[tid])) etc. }
> [...]
> This is a passable workaround, yes.  The downside is that if stat were
> very large, I would have to fudge with the maxaction counter.  If I was
> only interested in maybe the top 20, then with a single loop construct
> it's easy to break out after 20 and not hit the MAXACTION boundary.

Unless I'm mistaken, the current runtime aggregates the whole pmap for
loops/sorting, even if you want just the top 20.  This cost will be
fully reflected in activity count (bug #1885) at some point.  It is
unlikely to cost much less than the explicit copying loop above.

I wonder if this behavior makes sorting on statistical values
sufficiently inefficient that special syntax is not sufficiently
justified at this point, given that open-coding is possible.


> >> Along the same lines, it would be extremely useful to be able to do
> >> "cascading" sort - i.e. sort by more than one field.
> > 
> > [...]
> >   foreach ([x1+, x2--, y2+++] in array----) { ... }
> 
> That's not a bad suggestion, though I think it's not obvious in which
> order the cascading happens.  [...]

I guess we'd pick and document one of the two interpretations.


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]