This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] benchtests: Add malloc microbenchmark


On 25 June 2014 10:29, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Jun 19, 2014 at 05:46:08PM +0100, Will Newton wrote:
>> Add a microbenchmark for measuring malloc and free performance with
>> varying numbers of threads. The benchmark allocates and frees buffers
>> of random sizes in a random order and measures the overall execution
>> time and RSS. Variants of the benchmark are run with 1, 4, 8 and
>> 16 threads.
>>
>> The random block sizes used follow an inverse square distribution
>> which is intended to mimic the behaviour of real applications which
>> tend to allocate many more small blocks than large ones.
>>
>> ChangeLog:
>>
>> 2014-06-19  Will Newton  <will.newton@linaro.org>
>>
>>       * benchtests/Makefile: (bench-malloc): Add malloc thread
>>       scalability benchmark.
>>       * benchtests/bench-malloc-threads.c: New file.
>> ---
>>  benchtests/Makefile              |  20 ++-
>>  benchtests/bench-malloc-thread.c | 299 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 316 insertions(+), 3 deletions(-)
>>  create mode 100644 benchtests/bench-malloc-thread.c
>>
>> Changes in v3:
>>  - Single executable that takes a parameter for thread count
>>  - Run for a fixed duration rather than a fixed number of loops
>>  - Other fixes in response to review suggestions
>>
>> Example of a plot of the results versus tcmalloc and jemalloc on
>> a 4 core i5:
>>
>> http://people.linaro.org/~will.newton/bench-malloc-threads.png
>>
> That graph looks interesting. It is little weird that in libc a 2 and
> three thread take nearly same time but not when you use four thread one.
>
> For other allocators a dependency is linear. How could you explain that?

I expected to potentially see two inflection points in the curve. One
due to the single thread optimization in glibc that will make the
single threaded case disproportionally faster. I also expected to see
some kind of indication that I had run out of free CPU cores (and thus
context switch overhead increases). I ran the test on a 4 core i5
(hyper-threaded). I believe that's what is visible here:

1. Single threaded disproportionally faster
2. Curve gradient is lower from 1 -> number of cores (and this seems
to be visible in at least tcmalloc as well)
3. Curve gradient increases and remains roughly constant above number of cores

-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]