This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: hash table subtleties


On Thu, 29 Oct 1998, Tel wrote:

> > The magic ratio is chosen so that the expected mean nonempty bucket size
> > after upsize is the same as it is after downsize.  (and then, e.g. if a
> > user inserts just enough entries for the table to grow to 4096 buckets, he
> > must delete half of them before it will shrink back down to 2048). 
> 
> This sounds like the right idea, it will prevent overly frequent resize.
> I don't follow the reasoning behind your magic ratio but if it works then
> it works.
> 

I just refined the magic ratio through trial and error.  I think I'm
actually tring to estimate

1/2 * (number nonempty buckets) / (number buckets0

but no matter: Harvey pretty much convinced me to drop this whole
"nonempty buckets" thing for fear of a very bad worst case scenario.

> Hmmm, since you resize by powers of two, each single bucket in the small
> table splits into exactly two buckets in the large table so a bucket that is
> empty in the small table is guaranteed to stay empty after a grow.
> 
> That means that, when growing the table, you can:
> * make a bitmask equal to the size of the table before the grow (i.e. old size)
> * for each non-empty bucket (indexed 0, 1, 2...) in the old table:
> *** grab the list of keys from this bucket
> *** for each key in the list:
> ---- do a bitwise AND of the bitmask against the hash of the key
> ---- add the result of the previous step to the current bucket index and
>      use this new index to locate a destination bucket in the new table to
>      throw the key into

wow.  I never thought of it that way.  When I convert the hash to an
index, I do this

index = hash & (vec_len - 1);

which is pretty darn fast.  But I think I could coax some extra
performance from the tables if I use your ideas.

> When shrinking a table, it's even better because you just append
> the upper and lower bucket lists together, don't even look at the
> hashes <grin>. I'll guess that should speed your deletion up a bit.
> This works for any shrink operation that halves the size of the table.

> 
> PS: I NEVER did see your source code posting,
> this list seems to discard large messages.

you know, I never got it either.  Nice to see big brother enforcing
mailing-list etiquette ;)

The code and all kinds of goodies are at
http://www.giss.nasa.gov/~jglascoe/hashtab-type/