This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Hashtables in guile.


Telford Tendys <telford@triangle.triode.net.au> writes:

> In general, B-Trees are not as fast to do an insert or find as what a 
> hash table is. However, they do dynamically resize and never need rehashing.

Another option is skip lists.  I've always thought they were somewhat
interesting.  They are totally insensitive to ordered input (unlike
binary trees) and have no re-balancing costs.

Basically, skip lists are lists where each node has at least one next
pointer and maybe more.  Anytime you create a new node, you decide how
many next pointers it should have randomly.  Like this:

  (while (< (random) 0.5)
    (set! number-of-pointers (+ number-of-pointers 1)))

The level 0 next pointer is the same as the one in a standard linked
list.  The level 1 pointer (if there is one) points to the next item
with a level 1 pointer, etc. So you end up with something that looks
like this (rough approximation):

  3 ---------------------------| |--------------------->()
  2----------------------------| |-------| |----------->()
  1------------>| |->| |-------| |--| |->| |------>| |->()
  0 ->| |->| |->| |->| |->| |->| |->| |->| |->| |->| |->()
      |a|  |b|  |f|  |k|  |n|  |q|  |s|  |u|  |w|  |y|

The expected performance is log N for insertions, deletions, and
lookups, and, since the determination of a node's link-level is
random, the only thing that can cause you to have pathological
structure (like sorted input does for a standard binary tree) is a
really bad random number generator.  There is no input a malicious
agent could choose to get worst case behavior.  Further, the
implementation is *much* simpler than for AVL, RB, or other balanced
binary trees.

Finally, you can choose a different value than 0.5 for the random
link-level assignment above to make the pointers sparser, and you
still get an expected total path length of

  log_{1/p}((n +1)/(1-p))

where p is the value you choose.

Lewis and Denenberg (Data Structures & Their Algorithms) has more
info.

FWIW

-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930