This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Question about madvise(DONTNEED) in glibc malloc
- From: KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>
- To: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>
- Cc: KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Sun, 14 Apr 2013 18:09:46 -0700
- Subject: Re: Question about madvise(DONTNEED) in glibc malloc
- References: <516ADB3C dot 9040805 at gmail dot com> <CAAHN_R2FK4Fj4u1hHJJ17fr2X_PJxDs+6h2azWbUzbZth2HdfQ at mail dot gmail dot com>
(4/14/13 10:42 AM), Siddhesh Poyarekar wrote:
> On 14 April 2013 22:07, KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:
>> Hi all,
>>
>> Now, we linux MM folks discuss are discussing about new memory discarding feature.
>> (https://lkml.org/lkml/2013/3/12/105). The motivation is similar wtih MADV_FREE,
>> but more efficient. (http://lwn.net/Articles/230799)
>>
>> And I played ebizzy benchmark a bit because jemalloc claims jemalloc is faster than glibc
>> by using it (http://people.freebsd.org/~kris/scaling/ebizzy.html) and the patch auther
>> claimed vrange patch improves that. And I've found current glibc's MADV_DONTNEED usage
>> is crazy wrong.
>>
>> Please look at following result. MADV_DONTNEED makes 5 milion minor page fault and it
>> decrease transaction performance (record/s) from 73259 to 168333. My machine is typical
>> laptop. core i7 4cpu (8 threads) w/ 2G ram. When using larger machine, MADV_DONTNEED decrease
>> a performance more.
>>
>>
>> % perf stat ./ebizzy -S 3
>> 16833 records/s
>> real 3.00 s
>> user 6.83 s
>> sys 17.09 s
>>
>> Performance counter stats for './ebizzy -S 3':
>>
>> 23914.067812 task-clock # 7.941 CPUs utilized
>> 2,609 context-switches # 0.109 K/sec
>> 137 CPU-migrations # 0.006 K/sec
>> 4,803,074 page-faults # 0.201 M/sec
>>
>> % MALLOC_DISCARD_HEAP=0 perf stat ./ebizzy -S 3
>> 73259 records/s
>> real 3.00 s
>> user 23.84 s
>> sys 0.05 s
>>
>>
>> Performance counter stats for './ebizzy -S 3':
>>
>> 23919.162533 task-clock # 7.945 CPUs utilized
>> 2,533 context-switches # 0.106 K/sec
>> 77 CPU-migrations # 0.003 K/sec
>> 4,256 page-faults # 0.178 K/sec
>
> This doesn't prove that glibc use of MADV_DONTNEED is wrong. What
> this proves is that never giving memory back to the system results in
> crazy fast performance since we reduce syscall overhead. It doesn't
> justify never returning memory back to the system though.
It does. You need to look at current heap_trim() or you don't understand
current DONTNEED design.
> extra = (top_size - pad - MINSIZE - 1) & ~(pagesz - 1);
> if(extra < (long)pagesz)
> return 0;
> /* Try to shrink. */
> if(shrink_heap(heap, extra) != 0)
> return 0;
heap_trim() only check extra size is larger than page size.
And Quote form man madvise.
> MADV_DONTNEED
> Do not expect access in the near future. (For the time
> being, the application is finished with the given range, so
> the kernel can free resources associated with it.) Subse-
> quent accesses of pages in this range will succeed, but
> will result either in re-loading of the memory contents
> from the underlying mapped file (see mmap(2)) or zero-fill-
> on-demand pages for mappings without an underlying file.
Current implemantation cleary does as document. It's not a bug and we don't plan to
change never.
>
>> - MADV_DONTNEED assume discarded memory is 99.999% reused. but current glibc's assumption is
>> clealy opposite. glibc assume it is very light weight when glibc prediction is not correct.
>> I have no idea where this mismatch come from.
>>
>> - HPC folks want an allocator never return memory to OS. They are one of MALLOC_TRIM_THRESHOLD
>> main user. however current MADV_DONTNEED usage don't have disabling knob. I couldn't found
>> any reasonable reason.
>
> Because it's yet another knob with a niche use case. Have you tested
> with MALLOC_TRIM_THRESHOLD_ set to a ridiculously high value - 2GB ot
> 4GB? How does it compare? It ought to give you performance similar to
> the new knob.
_int_free(mstate av, mchunkptr p, int have_lock)
{
(snip)
if (av == &main_arena) {
#ifndef MORECORE_CANNOT_TRIM
if ((unsigned long)(chunksize(av->top)) >=
(unsigned long)(mp_.trim_threshold))
systrim(mp_.top_pad, av);
#endif
} else {
/* Always try heap_trim(), even if the top chunk is not
large, because the corresponding heap might go away. */
heap_info *heap = heap_for_ptr(top(av));
assert(heap->ar_ptr == av);
heap_trim(heap, mp_.top_pad);
}
MALLOC_TRIM_THRESHOLD_ is only for main thread.
>> In this week, we kernel MM folks plan to discuss on Linux MM summit (http://events.linuxfoundation.org/events/lsfmm-summit).
>> then, quick response is much appreciate even if not accuate.
>
> Were any glibc developers invited? ;)
Sorry, MM Summit is invitation only and kernel developers discussion place.