This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: improving malloc
- From: Andi Kleen <andi at firstfloor dot org>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Sat, 05 Jan 2013 23:05:05 -0800
- Subject: Re: improving malloc
- References: <20130105090242.GA4490@domone.kolej.mff.cuni.cz>
OndÅej BÃlka <neleai@seznam.cz> writes:
>
> It bottleneck on core2 where compare and swap takes 80 cycles.
> Trend is that CAS is faster on modern processors, on sandy bridge
> its only 30 cycles.
The 30 cycles is for the case when the cache line is in the local L1
already.
But interesting is what happens when it is in someone else's cache.
Under thread contention you will get many orders of magnitute
longer latencies, especially on larger systems.
Written mallocs in 2013 that are not thread local is simply not
acceptable anymore.
Generally writting a good threaded malloc is tricky. The case
of allocating on one thread and freeing on another is also
important, so you have to have a per thread data structure that
is still friendly to other threads
Other considerations are memory fragmentation, how quickly
it can give back unused memory to the OS, etc. etc.
Writing a good general purpose malloc is extremly hard.
I did some experiments with tcmalloc some time ago and it can give
speedups because it has a much faster fast path for the uncontended
case. However tcmalloc has some issues that do not really make
it fully general purpose, i.e. it it unable to ever give memory back
to the OS.
-Andi
--
ak@linux.intel.com -- Speaking for myself only