This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Faster memset.


On Tue, Apr 09, 2013 at 07:13:03PM -0400, Carlos O'Donell wrote:
> On 03/26/2013 01:25 PM, OndÅej BÃlka wrote:
> > On Sat, Mar 23, 2013 at 03:54:20PM +0100, OndÅej BÃlka wrote:
> >> Hello,
> >> I looked how is memset implemented and as it used computed jumps which
> >> are expensive I decided to write different implementation. 
> > snip
> >>
> >> For behaviour on unit tests (for real programs I need to also 
> >> handle calls from dynamic linker.) see following:
> >> http://kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile.html
> >>
> > I collected some data, so far how gcc uses memset. See
> > http://kam.mff.cuni.cz/~ondra/memset_dryrun.tar.bz2
> > I now do not know which implementation is faster on intel processors.
> > 
> > run make, then ./show to see gcc workload. If you want compute statistic
> > best way is use modified replay.c file.
> > 
> > It helps that there consecutive memsets are in 36% of cases called with 
> > same size, 71% size of previous two.
> > 
> > On amd ./benchmark script which runs current and  my implementation is
> > faster on my implemenatation and I am reasonably sure it is in practice.
> > 
> > For intel ./benchmark is faster with current implementation. Problem is
> > that it does not take into account cache behaviour that happened in
> > meantime. 
> > On previous test my implementations gains mostly when current
> > implementation computed jump is not in cache and this benchmark
> > underestimates this factor.
> 
> So a win for one and a loss for the other.
> 
> How much of a win and how much of a loss?
> 
When I did profiling it supports theory that cache cost dominates and my 
implementation is faster. Result is here. 

http://kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile/result.html

Results are slower when random test, I am not sure why.

Ondra


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]