This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: improving malloc


OndÅej BÃlka <neleai@seznam.cz> writes:
>
> It bottleneck on core2 where compare and swap takes 80 cycles. 
> Trend is that CAS is faster on modern processors, on sandy bridge 
> its only 30 cycles.

The 30 cycles is for the case when the cache line is in the local L1
already.

But interesting is what happens when it is in someone else's cache.
Under thread contention you will get many orders of magnitute
longer latencies, especially on larger systems.

Written mallocs in 2013 that are not thread local is simply not
acceptable anymore.

Generally writting a good threaded malloc is tricky. The case
of allocating on one thread and freeing on another is also 
important, so you have to have a per thread data structure that
is still friendly to other threads 

Other considerations are memory fragmentation, how quickly 
it can give back unused memory to the OS, etc. etc.

Writing a good general purpose malloc is extremly hard.

I did some experiments with tcmalloc some time ago and it can give
speedups because it has a much faster fast path for the uncontended
case. However tcmalloc has some issues that do not really make
it fully general purpose, i.e. it it unable to ever give memory back
to the OS.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]