This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Hashtables in guile.


> Assume we have a hasher which returns an integer and is modulo-safe.
> Let n be the current number of bins, and let m be such that
> 2**(m-1) <= n < 2**m.  To find where an item should go,
> hash the key, modulo it by 2**m, then by n.  Store it in a linked list
> of buckets there (as in the standard separate chaining).  When you
> need to expand the hash table from n to n+1, you only need to re-hash
> the keys in bucket n+1-2**(m-1); they will end up in either the same
> bucket or in bucket n+1.  Similarly for contracting the table. 

OK, I think I understand how this works but I would expect that to find
where an item should go you should use something like:

(define (position HASH M N)
  (let ((p (modulo HASH (expt 2 M))))
    (cond ((< p N) p)
           (else (- p (expt 2 (- M 1)))))))

Which is different to `modulo by 2**m then by n' in as much as the
fold between n and 2**m doesn't depend on n. If you just modulo by n then
surely you will have a bunch of buckets to rehash?

Either way, the double fold of the hash will increase the load on the
middle numbered buckets (roughly double the other buckets) so the ease
of resize comes at some penalty. Thanks for the explanation though,
I'm sure it will be useful somewhere.

> [Hashing] is one area (like crypto) where trusting
> the wizards might be better than rolling your own.  There are hash
> functions designed to be fast and to de-cluster english text as much
> as possible.  Oh, a perfect hash generator would be cool too. 

It is sometimes possible to hash BETTER than a uniform random number
since random numbers will have some clustering too but only if you
have a good idea of what you will be hashing (OK, I'll admit that it
is usually text, maybe not always English though).

	- Tel