This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] randomize benchtests


On Fri, May 17, 2013 at 02:44:24PM +0200, Torvald Riegel wrote:
> On Fri, 2013-05-17 at 13:47 +0200, OndÅej BÃlka wrote:
> > On Fri, May 17, 2013 at 01:16:01PM +0200, Torvald Riegel wrote:
> > > On Fri, 2013-05-17 at 12:44 +0200, OndÅej BÃlka wrote:
> > > > On Fri, May 17, 2013 at 12:24:30PM +0200, Torvald Riegel wrote:
> > > > > On Mon, 2013-04-22 at 14:56 +0200, OndÅej BÃlka wrote:
> > > > > > On Mon, Apr 22, 2013 at 05:44:14PM +0530, Siddhesh Poyarekar wrote:
> > > > > > > On 22 April 2013 17:30, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > > > > > > > +      clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &start);
> > > > > > > > +      for (k = 0; k < iters; k++)
> > > > > > > > +        {
> > > > > > > > +         i = rand_r (&seed)%NUM_SAMPLES;
> > > > > > > > +         BENCH_FUNC(i);
> > > > > > > > +        }
> > > > > > > > +      clock_gettime (CLOCK_PROCESS_CPUTIME_ID, &end);
> > > > > > > 
> > > > > > > This is wrong.  The interval also has the time taken to call rand_r.
> > > > > > > 
> > > > > > This is not wrong. You are interested only on differences between
> > > > > > implementations and adding same time from rand_r calls does not change
> > > > > > that. 
> > > > > 
> > > > > But if we should be changing the rand_r implementation in the future
> > > > > (e.g., if we might be getting HW support for it on a certain
> > > > > architecture), then this will lead to a difference in all our
> > > > 
> > > > From stdlib/rand_r.c:
> > > > 
> > > > /* This algorithm is mentioned in the ISO C standard, here extended
> > > >    for 32 bits.  */
> > > > 
> > > > Given that we can break applications that depend on rand_r
> > > > reproducibility it will not change. 
> > > > If you want fully specified generator use drand48.
> > > 
> > > My comment was about the performance of one call to rand_r.  This isn't
> > > about where or how it's specified, it's just about the performance, and
> > > trying to keep it reproducible.
> > Performance does not matter as we measure overhead with empty body and
> > substract it.
> 
> So you later on added the second option that I mentioned in my previous
> email, calibrating against a loop with just a randr call?  If so, then
> that's fine.  But why didn't you mention this right away?
>
For simplicity i defered this to later commit. As this does not change
differece between implementation performance it had low priority.

> > This only adds noise which can be controled by sufficient
> > number of samples.
> > 
> > Reproducibity? These tests are not reproducible nor designed to be
> > reproducible.
> 
> They should be, though not necessarily at the lowest level.  If they
> wouldn't be reproducible in the sense of being completely random, you
> couldn't derive any qualitative statement -- which we want to do,
> ultimately.

You must distinguish between http://en.wikipedia.org/wiki/Noise
and http://en.wikipedia.org/wiki/Bias. Expected value of former does not
depend on selected implementation where it does not matter in latter.

What should be reproducible are ratios between implementations in single
test(see below). This is thing that matters. When you compare different
runs you introduce bias without care. and as you said:
> Even if there is noise that we can't control, this doesn't mean it's
> fine to add further noise without care (and not calibrating/measuring


> 
> > Runs in same machine are affected by many environtmental
> > effects and anything other than randomized comparison of implementations
> > in same run has bias that makes data worthless.  
> 
> Even if there is noise that we can't control, this doesn't mean it's
> fine to add further noise without care (and not calibrating/measuring
> against a loop with just the rand_r would be just that).
> 
Adding noise is perfectly fine. You estimate variance and based of this
information you choose number of iterations such that measurement error caused 
by it is in 99% of cases within 1% of mean.

You probably did mean bias and this is reason why we do not try to
compare different runs.

> Even if we have different runs on different machines, we can look for
> regressions among similar machines.  Noise doesn't make the data per se
> worthless.  And randomizing certain parameters doesn't necessarily
Did you try run test twice? 

> remove any bias, because to do that you need to control all the
> parameters with your randomization.  And even if you do that, if you
That you cannot eliminate something completely does not mean that you
should give up. A goal is to manage http://en.wikipedia.org/wiki/Systematic_error
and have it within reasonable bounds. 

A branch misprediction and cache issues are major issues that can cause
bias and randomization is crucial. 

Other factors are just random and they do not favor one implementation
over another in significant way. 

> use, say, a uniform distribution for a certain input param, but most our
> users are actually interested in just a subset of the input range most
I do exactly that, see my dryrun/profiler. I do not know why I am
wasting time trying to improve this.
> of the time, then even your randomization isn't helpful.
> 
> To give an example: It's fine if we get measurements for machines that
> don't control their CPU frequencies tightly, but it's not fine to throw
> away this information (as you indicated by dismissing the idea of a
> warning that someone else brought up).
Where did I wrote that I dismissed it? I only sayed it that there are
more important factors to consider. If you want to write patch that warns 
and sets cpu frequency fine.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]