This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: DT_GNU_HASH: ~ 50% dynamic linking improvement


On Thu, Jun 29, 2006 at 07:27:06PM +0100, Michael Meeks wrote:
> 	1. Extensibility
> 		+ I want to see a multiply per lookup.
> 		+ ie. since the L2 cache misses are what hammer us -
> 		  if [sic] we want to extend the .chain -Bdirect
> 		  support in future - being able to put that in
> 		  the same record is really important.
> 		+ a 'record size' field gives us that with ~no
> 		  bloat as of now.
> 		+ [ and I really think some form of direct linking
> 		  is the future - but we can discuss that later 
> 		  clearly ]
> 		+ ie. having an extensibility story that is better
> 		  than 'just add a new section [ somewhere miles
> 		  away in memory/not-in-cache ] is not so fun.

On the other side, it is IMHO much better to keep using one section
for one purpose, as ELF has always been doing.  So, use a hash section
for mapping a symbol name to a set of indexes of dynamic symbol definition
candidates, not less, not more.  That's what DT_HASH has been doing and
DT_GNU_HASH is doing too.  Direct linking is mainly useful for OSes where
there is one entity controlling the whole thing, in the Open Source world
it can IMHO work only in very limited cases for program which have the
vast majority of dependent libraries coming from the same source, with the
same schedule, with bugfixes leading to shipping a new version of the whole
thing.  If the libraries on the other side are updated independently,
without one body controlling them all, symbols keep being added and removed
(for C the latter would be an ABI break, but for C++ the latter often
happens just when you change the internal implementation of some
function/method and it no longer references some function/method which suddenly
does not need to be compiled in) and direct linking just will lead to
horrible ODR and pointer equality violations that programs will sooner or
later break on.  So for direct linking which certainly won't be used for
all programs, but perhaps just a very limited set, you really should use
a separate section.

> 		+ Furthermore - by inspection it can be seen that
> 		  1/2^32 ~=== 1/2^30 [1]
> 			+ using this theorem, we can easily steal
> 			  1/2 bits from the top of the stored hash
> 			  and use them as chain terminators.
> 			+ ie. lets have:
> 
> .hash[symhash % nbuckets] -> .chain[5]
> 
> .chain[5] -> [ (not-end) | hashval[5] & 0x3fff ]
> .chain[6] -> [ (not-end) | hashval[6] & 0x3fff ]
> .chain[7] -> [ (is-end)  | hashval[7] & 0x3fff ]
> 
> 			+ that saves us several bytes per chain,
> 			  and cf. above, no measurable loss in
> 			  performance. [ of course chain[5] 
> 			  <-> .dynsym[5] here as before ]
> 
> 		+ in fact, for extensibility a per-shlib hash
> 		  comparison 'mask' field would make for a rather
> 		  better behaviour - so we can snarf bits from
> 		  here in future if we come up with brighter ideas.

Applications keep growing every year and while I currently counted
~ 500000 different symbol names, it can soon double, triple and keep
increasing.  So, while we have ATM just a few collisions, there can be
more and more in the future.  And handling those is expensive.
And, all linker can do for this is try to keep all the chains short enough
to fit into cachelines.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]