This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] ld speedup 1/3 (suffix merge)


On Wed, Sep 10, 2003 at 07:14:57PM +0930, Alan Modra wrote:
> michael's+tweaks
> real    0m11.794s	user    0m11.480s	sys     0m0.310s
> 
> "+tweaks" are:
> - Removing the inner loop Michael+Lars had to handle the case where we
>   find a string that's a tail of another tail.
> - Avoiding the zero terminator length adjustments inside comparison
>   functions by fiddling the length before we call qsort.
>   merge.c:is_suffix still unnecessarily compares zero terminators, but
>   to fix that I'd need to adjust the length back in a separate loop,
>   which would likely be slower.
> - A small bugfix to sec_merge_hash_lookup.
> - Testsuite addition.
> 
> So, hey, let's get those copyright assignments done so this can go in!

This patch seems to do all the string merging as current CVS merge.c
only if alignment == entsize (well, in that case it might create the
section even entsize bytes smaller by merging '\0' into some string).

I used attached program to generate a testcase on IA-32, compiled the
result -g -O2, s/\.rodata\.str/rodata.str/g in the assembly (I could
change linker script too) and the results are:
without the patch
  [15] rodata.str1.32    PROGBITS        09d366c0 1cee6c0 1491c8 01 AMS  0   0 32
  [16] rodata.str1.1     PROGBITS        09e7f888 1e37888 00596d 01 AMS  0   0  1
  [17] rodata.str4.32    PROGBITS        09e85200 1e3d200 0b91e8 04 AMS  0   0 32
  [18] rodata.str4.4     PROGBITS        09f3e3e8 1ef63e8 005734 04 AMS  0   0  4
with the patch
  [15] rodata.str1.32    PROGBITS        09d366c0 1cee6c0 183128 01 AMS  0   0 32
  [16] rodata.str1.1     PROGBITS        09eb97e8 1e717e8 00596c 01 AMS  0   0  1
  [17] rodata.str4.32    PROGBITS        09ebf160 1e77160 133584 04 AMS  0   0 32
  [18] rodata.str4.4     PROGBITS        09ff26e4 1faa6e4 005730 04 AMS  0   0  4
Times:
without the patch
real    0m2.733s
user    0m0.620s
sys     0m0.820s
with the patch
real    0m2.665s
user    0m0.500s
sys     0m0.780s
(minimum times from a bunch of ld invocations).
This was with N 256, I 1000 and USE_W 1.

With N 32, I 1000 and USE_W 1 I get:
without the patch
  [15] rodata.str1.1     PROGBITS        083d0a48 388a48 003efb 01 AMS  0   0  1
  [16] rodata.str1.32    PROGBITS        083d4960 38c960 000a5f 01 AMS  0   0 32
  [17] rodata.str4.32    PROGBITS        083d53c0 38d3c0 000f00 04 AMS  0   0 32
  [18] rodata.str4.4     PROGBITS        083d62c0 38e2c0 0041bc 04 AMS  0   0  4
with the patch
  [15] rodata.str1.1     PROGBITS        083d0a48 388a48 003efa 01 AMS  0   0  1
  [16] rodata.str1.32    PROGBITS        083d4960 38c960 000a5f 01 AMS  0   0 32
  [17] rodata.str4.32    PROGBITS        083d53c0 38d3c0 000f00 04 AMS  0   0 32
  [18] rodata.str4.4     PROGBITS        083d62c0 38e2c0 0041b8 04 AMS  0   0  4
Times:
without the patch
real    0m0.287s
user    0m0.160s
sys     0m0.130s
with the patch
real    0m0.271s
user    0m0.150s
sys     0m0.120s

With N 32, I 10000 and USE_W 0 I get:
without the patch
  [15] rodata.str1.1     PROGBITS        0891c8c8 8d48c8 028001 01 AMS  0   0  1
  [16] rodata.str1.32    PROGBITS        089448e0 8fc8e0 0067e0 01 AMS  0   0 32
  [17] rodata.str4.4     PROGBITS        0894b0c0 9030c0 000004 04 AMS  0   0  4
with the patch
  [15] rodata.str1.1     PROGBITS        0891c8c8 8d48c8 028000 01 AMS  0   0  1
  [16] rodata.str1.32    PROGBITS        089448e0 8fc8e0 0067e0 01 AMS  0   0 32
  [17] rodata.str4.4     PROGBITS        0894b0c0 9030c0 000004 04 AMS  0   0  4
Times:
without the patch
real    0m2.109s
user    0m1.830s
sys     0m0.280s
with the patch
real    0m1.685s
user    0m1.440s
sys     0m0.250s

All the generated tests returned 0 both with old ld and patched ld, so there are
no correctness issues involved, just link speed and quality of string merging.

libc.so link on IA-32:
without the patch
real    0m11.891s
user    0m5.740s
sys     0m0.790s
with the patch
real    0m11.459s
user    0m5.240s
sys     0m0.810s

Anyone can reproduce this?
Maybe the generator below plus some .exp tweaks would be good for binutils testsuite
too (well, probably with smaller number of base strings than 1000, as e.g
N=256, I=1000, USE_W=1 generates 20M .c file, 20M .s file, 33M .o file and
32M binary).

	Jakub

Attachment: generate-merge-test.c
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]