This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Use unaligned access on x86_64


>> * the linking time (30 runs average) goes from
>>
>> 1.310065610 seconds time elapsed ( +-  0.19% )
>>
>> to
>>
>> 1.162564763 seconds time elapsed ( +-  0.14% )
>
> Hmmm, I guess x86 has gotten a lot better with this.

I guess. Just to be sure I ran the test again on both a release (-O3
--gc-section) and debug (no optimizations) build of clang.

On the release build the linking time goes from

1.295838977 seconds time elapsed ( +-  0.35% )

to

1.173005344 seconds time elapsed ( +-  0.08% )

and in the debug build it goes from

14.913505831 seconds time elapsed ( +-  0.22% )

to

13.857411643 seconds time elapsed ( +-  0.15% )

> I'd rather have a configure flag that tells us whether the host
> platform can do unaligned access without (much) penalty. I did a quick
> search but didn't come up with anything provided by autoconf. Maybe
> add a configure option like --enable-fast-unaligned-access? Other
> suggestions? Write a micro-benchmark for configure to run on the fly?

The way gold is written, an unaligned access would just fail on an
architecture that is not as lenient as x86. It can be made to work
with something like

unsigned f(char *p) {
  unsigned x;
  memcpy(&x, p, sizeof(unsigned));
  return x;
}

which compiles to just a load on x86 but multiple where unaligned
loads are not supported.

Given that, I find it strange to have a configure option that can just
create a crashing gold in some configurations. A benchmark is even
more problematic as we don't control how quiet the machine is while
running configure.

It seems best to run a benchmark (actually using gold to link) for
each architecture that wants to opt-in to the unaligned option and
record the result somewhere.

It can be a configure check for the host or a series of #ifdefs. I
have a small preference for #ifdef since configure is just so slow,
but I can try to write a m4 macro for it if you prefer.

> On the other hand, the archive format should generally keep things on
> 4-byte boundaries -- the magic string is 8 bytes, archive headers are
> 60 bytes, and ELF file members will generally be a multiple of 4 or 8
> bytes in length. The symbol map should be a multiple of 4, but I'll
> bet it's the long-file name table that's throwing everything out of
> alignment. If we could just fix that, we could probably improve
> archive performance on many platforms where unaligned loads are not
> fast. Of course, for 64-bit targets, we're going to insist on 8-byte
> alignment, so to avoid the malloc-and-copy, we'd have to arrange for
> archive members to be 8-byte aligned.

I can't remember the details on how they got there, but I have seen
archive members that are 2 byte aligned.

> Also, have you tried thin archives?

No, I intentionally wanted to try the case of unaligned files. There
will probably always be a use case for shipping a fat .a (game engine,
llvm itself, etc).

One day I hope to start adding support for various tools for the BSD
(or was it MacOS X?) archive format that puts the name just before the
archive member. In that format it is always possible to align the
member by adding more null characters to the end of the name.

Cheers,
Rafael

P.S.: I just tried it on gcc-112 (powerpc64le) and it looks like
unaligned is also a win there

from

2.354317085 seconds time elapsed
   ( +-  1.35% )

to

 2.246461651 seconds time elapsed
    ( +-  0.18% )


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]