This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction


CPU2006 benchmark is very hard to improve so that the above 5%
improvement for single core may become the goal of next generation
CPU, and the improvement number is much less for benchmark specjbb. We
hardly accept above 1% improvement of those industry benchmarks only
for optimized memcpy_avx2 even though it is the fastest.

we presented the results because of 2 reasons:
1) Haswell CPU has full capability of handling indirect jump
instruction in memmcpy_avx2 in real-world scenario.
2)if we continue to test the benchmark for more times, we will find
which is better. For example we can test memcpy_avx2, memcpy_new over
3 times respectively , if we find which has more times of better
results, although the difference is very small, the stable results can
give us the right answer.

Thanks
Ling




2013/6/10, Andreas Jaeger <aj@suse.com>:
> On 06/10/2013 08:17 AM, Ling Ma wrote:
>> Last week, we separated 403.gcc from cpu2006 benchmark and compiled
>> with additional option -mstringop-strategy=libcall to avoid rep_4byte,
>> rep_8byte, rep_byte that use rep movs instructions. 403.gcc has plenty
>> of branch instructions, and is very sensitive for branch prediction
>> miss rate. Currently we are concerning about whether memcpy_avx2 cause
>> more branch prediction miss over benefit from it in real world
>> scenario, so 403.gcc will help us to verify it.
>>
>> We tested 403.gcc linked with memcpy_new, 403.gcc linked with
>> memcpy_avx2 for 3 times respectively:
>>
>> 403.gcc for memcpy_new results are below: (bigger and better)
>> 1) 67.63718
>> 2) 66.899156
>> 3) 66.982456
>>
>> 403.gcc for memcpy_avx2 results are below:
>>
>> 1) 66.805236
>> 2) 67.29362
>> 3) 67.63718
>>
>> Above comparison results indicate memcpy_avx2 seem to be better,
>> and we would like to do more experiments.
>
>
> If I take the arithmetic mean of these I get:
> 67.17293066666666666666 vs 67.24534866666666666666
>
> That's far less than 1 percent - so not conclusive at all,
>
> Andreas
> --
>   Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
>    SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
>     GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
>      GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]