This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Enhancing malloc


On Wed, May 29, 2013 at 08:29:44AM +0100, Will Newton wrote:
> On 28 May 2013 20:46, Carlos O'Donell <carlos@redhat.com> wrote:
> > On 05/28/2013 08:54 AM, Siddhesh Poyarekar wrote:
> >> On Tue, May 28, 2013 at 02:33:17PM +0200, OndÅej BÃlka wrote:
> >>> Malloc and friends are one of few libc functions which can be measured
> >>> directly. They account to about 50% of time spend in libc. I know that gcc
> >>> hevavily uses malloc. So authoritative test could be if following is
> >>> improvement or not:
> >>>
> >>> for I in `seq 1 10` do
> >>> echo new
> >>> LD_PRELOAD=new_malloc.so time gcc test.c
> >>> echo old
> >>> time gcc test.c
> >>> done
> >>>
> >>> You must take into account that malloc requests are small. I did some
> >>> measurements at
> >>> http://kam.mff.cuni.cz/~ondra/benchmark_string/malloc_profile_28_11_2012.tar.bz2
> >>
> >> For malloc and friends, the comparison should also include the effect
> >> of the change on fragmentation (internal as well as external) and not
> >> just speed of execution.
> >
> > I agree.
> >
> > In glibc's allocator we consciously try to collesce fastbins and use
> > MADV_DONTNEED to give back unused pages.
> >
> > We could get a performance boost by looking at the new vrange support.
> > Such support has already been tested in jemalloc and shown to potentially
> > improve performance.
> 
> Are there specific design goals of the current code? For example, if a
> new implementation increased memory usage but increased performance
> would that be acceptable?
> 
> I agree that a comprehensive set of benchmarks would seem to be the
> logical first step.
> 
Ok, there are several requirements.

1. For deciding if implementation is improvent an real world
time/memory performance is the deciding factor. 

A unit test are good for tuning implementation but can be misleading in
several ways.

A time spend in malloc/free is only part of picture. A way we allocate
affects cache locality. This is hard to capture in benchmarks but can
give savings that are bigger than time spend on malloc/free.

There are optimizations like prefetching first 1k allocated which can
help in practice but test will say they are slower.

Second part is choosing representative workloads. As locking is part of
problem you need exercise malloc in multithreaded setting. Particular case is
when memory is freed in other thread than allocated. This is rare in
practice so we can make this slower to make likely situation faster but
we must ensure that it is withing performance bounds.

You need to eliminate bias from kernel side of things. As kernel needs
to zero pages before giving them to proccess performance at start will
be different than in middle of testing so we need first run test dry to
get into stable state(which can be factor in real performance but I do
not know better solution.). 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]