This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Patricia trie symbol tables?


On Mon, 2006-06-26 at 18:25 +0200, Florian Weimer wrote:
> * John Moser:
> 
> > Reading what Drepper says about -Wl,-O1 and why to use it, I get the
> > gist of this:  You'll do comparisons against
> > "_ZN14some_namespace22some_longer_class_name" twice, what a waste.  The
> > point of -Wl,-O1 is to make the buckets smaller so this happens less
> > often, as a side effect of making the hash table bigger (in other words,
> > no guarantees; but it'll probably happen).
> 
> The technique described at <http://www.cs.princeton.edu/~rs/strings/>
> might also be a win, compared to hash tables.  (Quoting from my
> bookmark file, as I'm offline at the moment.)

A Radix trie gives a pretty straightforward O(n) number of character
comparisons, and NR_COMMON_PREFIXES branches (i.e. "foobarbaz" when you
have "foojin" "foobarjin" "foobarbaz" forces you to branch 3 times),
which is again O(n) (but you'll never see that case in a hash bucket);
so we're looking at total O(2n) ~= O(n) operations (in other words, it's
about equivalent to having a single symbol in each hash bucket).

A quick glance at the SODA paper gives me the impression that they were
trying to make something fast for sorting and searching changing data
sets.  I can't tell if it's more efficient than radix in our case;
although, it looks like they intend every node to be a single character.
With a radix trie, the nodes contain however many characters go before
you need a branch; this avoids a lot of jumping around.

SODA:
r    --a
o   /   \  (at, an)
o  /     t .-.n
t /       \
V/         --t--r--i--b--u--t--e (attribute..?  It'll probably balloon
i--s (is)                           into a tree here, like with isnt)
  / \
 n   t (isnt)

Radix:

root
V     (at)   (attribute)
---a--t--tribute
 |  |
 |  |-n (an)
 | (is)  (isn't)
 |-is--nt

At least, that's what I get at a quick glance.

Anyone more familiar with the stuff in the SODA paper?
-- 
John Moser <john.r.moser@gmail.com>

Attachment: signature.asc
Description: This is a digitally signed message part


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]