This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.\
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>, Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, Andi Kleen <andi at firstfloor dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Sun, 30 Jun 2013 21:52:01 +0200
- Subject: Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.\
- References: <51C23583 dot 1070307 at redhat dot com> <CAHjhQ93vWnCiVVU9MPoGptjQtn2J2PCDT2B7ZfXiKt+Cv_Rh_w at mail dot gmail dot com> <51C307A5 dot 7030608 at redhat dot com> <20130620151711 dot GA4891 at domone dot kolej dot mff dot cuni dot cz> <51C317AA dot 6080502 at redhat dot com> <20130621012427 dot GA4574 at domone dot kolej dot mff dot cuni dot cz> <CAAHN_R1HXyy0i25rtYKJ4Zox5u0R57xKbZDq=ZNf0BVm=7biMw at mail dot gmail dot com> <20130621135110 dot GB7973 at domone dot kolej dot mff dot cuni dot cz> <CAHjhQ921kXhi3hfqkHW_5pdYY2QYf6pzQ8OLondc6JJjj++4kQ at mail dot gmail dot com> <51CC602F dot 1010406 at redhat dot com>
On Thu, Jun 27, 2013 at 11:54:23AM -0400, Carlos O'Donell wrote:
> On 06/27/2013 03:24 AM, Liubov Dmitrieva wrote:
> > I think for this particular patch we don't need super accurate
> > benchmarks to see that it is better because we talk not about 20-60%
> > of boost but about several times asymptotically boost as current
> > benchmarks showed. It was a server machine, nobody runs Firefox there.
>
> Agreed, but we still need some kind of reproducible result that shows
> your patch improved performance. I'm not happy with performance patches
> going into glibc without some proof that they made things better.
>
You said proof but we are not in proof stage yet. We are not in claim
stage yet. As these "benchmarks" are like mechr one please explain with
following code questions below:
for (i = 0; i < 32; ++i)
{
HP_TIMING_NOW (start);
CALL (impl, s, c, n);
HP_TIMING_NOW (stop);
HP_TIMING_BEST (best_time, start, stop);
}
1. You use only 32 element sample. Can you be sure that this sample is
big enough to be relevant?
2. You take minimum of these samples. Please explain how this is related
to real performance.
3. You call this code in loop with same argmuments. Please explain why
real world usage cases are close enough that we can observe same
behaviour in real world.
Unless you can satisfactory answer these questions you did not prove
anything about performance only got some numbers that are loosely
related to it.