This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
- From: Ling Ma <ling dot ma dot program at gmail dot com>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: Nix <nix at esperi dot org dot uk>, libc-alpha at sourceware dot org, hongjiu dot lu at intel dot com
- Date: Sat, 8 Jun 2013 00:12:56 +0800
- Subject: Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
- References: <1370424188-4259-1-git-send-email-ling dot ml at alibaba-inc dot com> <20130605121816 dot GA11269 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dMiD=_Qf1EJ=F3hfyQDtQubDEC5pjpXKDCHrUQwhr=vzg at mail dot gmail dot com> <20130605161954 dot GA26401 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPWPaX5prcL-uAaqS6=_ehzKeBmAFMdwV6aU34jZ0eHtQ at mail dot gmail dot com> <20130606125511 dot GA28565 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPs9geCtrWhU1L_0DEfOWOknpzFSLmYs4gbYzGX8Zn5Hg at mail dot gmail dot com> <20130607104613 dot GA6343 at domone dot kolej dot mff dot cuni dot cz> <8761xqru5w dot fsf at spindle dot srvr dot nix> <CAOGi=dMV5jaS2597cksd0mW84UDd06SovsBkL5=WPez-jZWg4g at mail dot gmail dot com> <20130607160749 dot GA28961 at domone dot kolej dot mff dot cuni dot cz>
> First it does not randomize size in any way. This will cause branches to
> be predicted and as branch prediction can account to 20% of time results
> you get will be 20% off.
Ling: Because "A widely held rule of thumb is that a program spends
90% of its execution time in only 10% of the code", so hardware
implemented branch prediction mechanism, stable pattern history
provide benchmark(SPEC 2000) with average 95% correct prediction,
fully reandom code will make it useless.
> Fox example as you ran
> ./memcpy-test-avx2-bench
> cpy frequency could be 800MHz
> then in
> ./memcpy-test-new-bench
> a governor can decide to switch to 2.5GHz making results above three
> times worse than they are.
Ling: I can confirm it is not issue in my compare.html, but like to
send out double-check result.
Ondra, if we can test real benchmark, that will more approximate our
real world usage. So some people know good memcpy benchmarks which
represent the real world applications, and could you please tell us ?
Thanks & Best Regards
Ling