This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Re: suggestion for dictionary representation
- From: Daniel Berlin <dberlin at dberlin dot org>
- To: Daniel Jacobowitz <drow at mvista dot com>
- Cc: David Carlton <carlton at math dot stanford dot edu>,Jim Blandy <jimb at redhat dot com>, <gdb at sources dot redhat dot com>
- Date: Tue, 24 Sep 2002 00:28:16 -0400 (EDT)
- Subject: Re: suggestion for dictionary representation
On Mon, 23 Sep 2002, Daniel Jacobowitz wrote:
> On Mon, Sep 23, 2002 at 08:34:50PM -0400, Daniel Berlin wrote:
> > >I'm also curious about how it would affect the speed of reading in
> > >symbols. Right now, that should be O(n), where n is the number of
> > >global symbols, right?
> >
> > > If we used expandable hash tables, then I
> > >think it would be amortized O(n) and with the constant factor larger.
> >
> > Nope.
> > Our string hash function is O(N) right now (as are most). Hash tables
> > are only O(N) when the hash function is O(1).
> >
> > Now, if you have it only hash the first x characters, you can make your
> > hash table O(N) again, with the x as the constant. Of course, if only
> > hashing the first x characters causes tons of hash conflicts, it's not
> > going to make your hash table very fast.
>
> Wait a second... aren't you switching N's on us?
Yes, but this just makes it O(MN), where M is your string hash time, N is
your number of global symbols.
Although M is not likely to approach N, it could happen on *very* long
mangled names (i've seen some > 500 characters, in applications with
around that many *global* symbols), or easily if you index on demangled
names.
> is the number of
> global symbols. We're talking about the total time of adding all
> symbols to the table.
I'm quite aware.
> Hashing a string is "effectively" constant time,
It's not.
> because all string lengths are "small".
Bullshit.
libstdc++, fer instance, has an average symbol length of 42 (mangled).
Longest is 131 characters.
Demangled, the average is 86.
Longest is 1116.
>
> > >(But, I think, not larger in a way that would make a difference.) I'm
> > >curious about how often the "amortized" bit would lead to strange
> > >hiccups, but I don't think that's a big deal.
> > >
> > >But for skip lists, wouldn't it be something like O(n log n)? If so,
> > >that's an issue we have to consider.
> >
> > Put it in perspective.
> > for 1 billion symbols, n is 29.89.
> > for 1 million symbols, n is 19.93.
>
> You mean "log n", of course.
yup.
>
>