This is the mail archive of the kawa@sourceware.cygnus.com mailing list for the Kawa project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Result of research into optimizing implementation ofassociation lists

To: kawa at sourceware dot cygnus dot com
Subject: Result of research into optimizing implementation ofassociation lists
From: "Nic Ferrier" <nferrier at tapsellferrier dot co dot uk>
Date: Sat, 24 Jun 2000 19:13:33 +0100

Marco wrote:

>The scheme solution IMHO is 
>hash table=vector of alists + hash function.
>It has the same (theorical) performance of a java hash 
>table and is portable;

The difference is that it doesn't get you "Java portability". 
ie: you can't pass it to a Java method and have Java know what it's
doing without having Java intimately understand Kawa.

I sat down for a couple of hours and played about with this - some
people may be interested (apologies to the majority who most certainly
won't be).

There are two possibilities for optimizing performance of association
lists by using hashes implemented in Java:

1. overload (cons) to build hashtables when presented with possible
hash candidates

Overload isn't quite the right word - reimplement is more like it. 
It's a bit tricky since Kawa has no "instanceof" equivalent. 
If it did one could so something like this:

(define (cons head tail)
  (if (and (instanceof? tail <Hashtable>) 
              (instanceof? head <Pair>))
     (invoke tail 'put (car head) (cdr head)
     (make <Pair> head tail)
    )
  )

The (ass) procs can of course check the 2nd argument to see if it's a
hash and then use the hash methods. 

Possibly even better is to do:

  (define (assoc key :: <object> list :: <hash>)
    ...)

because it provides compile time check Unfortunately I think this
breaks what one expects of Scheme, even if it is better (!).

Disadvantages:
- leaves the problem of how other procs might work on this list, 
eg: (car <Hashtable>)? but having said that some of them could be
altered to work

- would need a special implementation of hashtable to work well
In order for things like (car <Hashtable>) to work the hash would
have to know about the order that elements came on and off

- adds weight to a fundamental operation like (cons)
This is probably a very bad thing - even though a good JIT would
probably optimize away the conditionals that are required for the
implementation.

2. put hashtable logic into <Pair>

The easiest way of doing this is to have <Pair> implement
java.util.Map to do what the (ass) methods do. That of course brings
no performance benefit whatsoever and adds weight to  <Pair>. It does
acheive the aim of having Java know that the list is a Map though.

The more difficult way of doing this is to have <Pair> understand
it's environment better, what I called an intelligent list in previous
posts:

- on (make <Pair> car cdr) the car should be analysed to see if it is
a <Pair>
  if so then car is added to an internal hash which keys by the (car)
of the value added.

- a <Pair>'s internal hash is only created when car==<Pair> and
cdr==EmptyList
  before the hash is created hash==null

- a reference to the hash is kept on every <Pair> on the list
  by copying the reference to new members of the list when they are
made
  ie: in: (make <Pair> x y)  when y==<Pair> do y.hash=this.hash

- the implementation of (ass) procs can 
  if(list.hash!=null)
    // use list's hash methods

- (cdr) on the list results in the (car) value being removed from the
hash

- (setcar!) on the list results in the (car) value being removed and
the new (car) added to the hash

- as soon as a value is added to the list which is not a <Pair> the
internal hash is zeroed.

Disadvantages:

- it's adding quite a lot of weight to the average list/pair
interaction

- needs a clever implementation of a hashtable (to handle hashing by
(car value))

- needs re-implementation of Kawa <Pair> 
to use methods to set() and get() <Pair>.car and <Pair>.cdr
if methods aren't used then (setcar!) and (cdr) and so forth will
never work. 
Methods obviosuly incur cost, even if a JIT could optimize it away

Obviously both of these incur (probably) unacceptable levels of cost
for ordinary list interactions. None the less; if a program used
association lists a lot then this would increase performance quite
considerably.

There is another possible implementation but it would require that
Kawa supported overloading of core procedures (eg: cons) and to my
knowledge Kawa doesn't support that and it looks like it would be
quite a major change.

The implementation would involve building equivalent hash procedures
of (cons) and all the other procs that might be called to work on
association lists. The compiler could then make a descision about
which method to choose.

Another problem with this is that it would require the procedure:

  (define (cons car :: <Pair> cdr :: <EmptyList>) :: <Hash>
   ...)

and this looks likely to get in the way of a general list
constructor.

I did actually start to implement option 2 inside Kawa but I got
bored with the idea of implementing all the methods of java.util.Map.

Nic Ferrier

Follow-Ups:
- Re: Result of research into optimizing implementation of association lists
  - From: Per Bothner

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]