This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
- From: "H. J. Lu" <hjl at lucon dot org>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: binutils at sources dot redhat dot com
- Date: Fri, 19 Jan 2007 07:03:16 -0800
- Subject: Re: PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
- References: <20060829230012.GA19841@lucon.org> <20070119144609.GI3819@sunsite.mff.cuni.cz>
On Fri, Jan 19, 2007 at 03:46:09PM +0100, Jakub Jelinek wrote:
> On Tue, Aug 29, 2006 at 04:00:12PM -0700, H. J. Lu wrote:
> > When we are comparing symbols in a section against another section in
> > a different file, we swap in 2 symbol tables, sort them and free the
> > symbol tables for each section. It is a O^2 problem. When there are
> > many sections and symbols in a file, it can be very slow. This patch
> > caches the symbol table unless --reduce-memory-overheads is used. The
> > results for the testcase are
> >
> > /usr/bin/time ./ld -shared -o libfoo.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o
> > 1.06user 0.13system 0:01.20elapsed 99%CPU
> > /usr/bin/time ./ld --reduce-memory-overheads -shared -o libbar.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o
> > 139.24user 8.93system 2:28.19elapsed 99%CPU
> >
> > It is about 130X speed up. The only concern I have is the memory
> > overhead. It seems small for the testcase.
>
> I was debugging this too on older (FC6) binutils for
> http://bugzilla.redhat.com/bugzilla/223181
> (initially thought the main problem is caching, but as backport of
> http://sources.redhat.com/ml/binutils/2006-11/msg00190.html
> didn't help, I looked at why is bfd_elf_match_symbols_in_sections
> so slow) and came up with a very different patch.
>
> What the routine used to do is:
> 1) swap in both symbol tables
> 2) qsort both of them, with huge entry size (32 bytes on 64-bit host)
> such that SHN_UNDEF symbols come last, other symbols are sorted
> based on st_shndx and when st_shndx is equal based on binding
> (this was actually the most expensive part, on the RH#223181
> testcase oprofile said 55.98% of total time spent in mempcpy,
> 19.26% in msort_with_tmp, 10.12% in memcpy and 4.23% in
> elf_sort_elf_symbol, all of which is from this qsort)
> 3) then we scan the sorted array and find the consecutive
> region in it which has st_shndx equal to the section,
> count how many symbols there are
> 4) if both sections have the same number of such symbols,
> create symtable{1,2} arrays and for symbols with st_shndx
> equal to shndx{1,2} store in the array a pointer to the
> isym and name pointer
> 5) qsort symtable{1,2} arrays based solely on symbol name
> 6) compare the symbol names, st_info, st_other
>
> Now, unless I grossly misunderstood this, we are doing step 2)
> only to select isyms with isym->st_shndx == shndx{1,2} and count them
> (we don't care about SHN_UNDEF symbols, those don't have isym->st_shndx
> equal to shndx{1,2}, we don't care about the ST_BIND sorting, because
> we shortly after that re-sort the array based on name.
> But, to select isyms with isym->st_shndx == shndx{1,2} we don't need
> to qsort at all, that can be done linearly.
My original idea is if we sort the symbol table first, we can stop
counting symbols when isym->st_shndx != shndx{1,2}. But if sorting
itself is very expensive, it isn't a good idea. Memory usage was
also a concern.
>
> So the patch below (against 2.17.50.0.6) removes the expensive 2) step
> and step 3) at the expense of allocating bigger temporary symtable{1,2} arrays
> (previously they were just 2 * sizeof (void *) * count{1,2}, with
> this patch they are * symcount{1,2}). But those are freed before
> the function returns and it pays off a lot. I'm getting similar
> speedup as you are getting with your patch.
>
> Now, your patch instead caches when not reduce_memory_overheads
> steps 1) and 2), which is quite memory hungry (some sections have
> huge number of (mostly SHN_UNDEF or other section) symbols).
>
> I wonder if we can't combine the benefits of both approaches.
>
Let me give it a try.
H.J.