This is the mail archive of the kawa@sources.redhat.com mailing list for the Kawa project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: string srfi


"Nic Ferrier" <nferrier@tapsellferrier.co.uk> writes:

> I was planning to reimplement it all, over a period of time, using
> Java. Don't see much point in doing most of these things with Scheme.

Well, using Scheme will be simpler, given that there is existing code.
Also, it will be more portable wrt possible changing calling conventions
if at least the wrappers (such as optional argument handling) are
done in Scheme.

But it is reasonable that the guts of the Scheme procedures be basically
invoke-static of methods written in Java.

Note that the string srfi depends on the charset srfi; you should
probably implement that first.  I've started sketching out a
"RangeTable" class, which provides efficient (trie-based) mappings
from int to any Object, optimized for ranges of values that map to the
Object.  You might correctly guess that this is partly intended for
character tables (such as syntax tables and charsets).  I've posted
the current version below.  The idea is that there is a list of
"Range" objects, one for each range of contiguous characters with the
same value.  There is also a trie with branch factor 16 (so each level
of the trie handles 4 bits of the key) for fast indexing.  See the
commented-out lookup procedure.  If all of the elements of a trie
array have the same value, the array is replaced by a pointer direct
to the corresponding Range.  The problem I'm having is how to update
the data structure.  It seems to involve the tedious task of checking
lots of different cases.  Perhaps someone has an insight as to how to
do this better?  Perhaps there is a suitable published algorithm?

By the way:  I'm planning on changing the FString implementation
so that there is a separate int length field.  I.e. the string
length is the value of the length field, not the length of the
array.  This makes some of the Collections methods (such as add)
more efficient; it also will support Common Lisps vectors with
fill pointers.

Perhaps a charset should be implemented using 4 32-bit words
for the character of Latin1 plus a RangeTable to handle other
characters?  Or just use a RangeTable that maps characters
in the set to some value like Boolean.TRUE.

> Ok... it's maybe not a clean room implementation but I would have
> thought it was enough given the risk involved.

I see no problem with that.  But keep notes on the extent to which
you use or study the reference implementation, so we can give
proper credit.

package gnu.kawa.util;

/** Map integers to Object.
 * Future implementaton will be optimized for ranges that map to the
 * same value, but the current implementation is bad except for 0..127. */

public class RangeTable // extends map
{
  Object[] index = new Object[128];
  java.util.Hashtable hash = new java.util.Hashtable(200);

  public Object lookup(int key, Object defaultValue)
  {
    if ((key & 127) == key)
      return index[key];
    return hash.get(new Integer(key));
  }

  public void set(int lo, int hi, Object value)
  {
    if (lo > hi)
      return;
    for (int i = lo;  ; i++)
      {
	if ((i & 127) == i)
	  index[i] = value;
	else
	  hash.put(new Integer(i), value);
	if (i == hi)
	  break;
      }
  }

  public void set(int key, Object value)
  {
    set(key, key, value);
  }

  /*
  Object[] index = null;
  //int index256 = null;

  Range ranges;
  
  private static void update(Object[] index, int lo, int hi, Object value)
  {
  }

  private Object update(int lo, int hi, Object value, int shift, Object[] index)
  {
    if (index == null)
      {
	if (shift == 24)
	  index = new Range(lo, hi, value);
	else
	  index = new Object[16];
      }
    int ilo = (lo >> shift) & 0xf;
    int ihi = (hi >> shift) & 0xf;
    for (int i = ilo;  i < ihi;  i++)
      {
	index[i] = update(lo, hi, shift - 4, index[i]);
      }
  }

  public void set(int lo, int hi, Object value)
  {
    set(lo, ho, value, index, 24);
  }

  private void set(int lo, int hi, Object value, Object[] index, int shift)
  {
    Range r = findLower(lo);
    if (r == null)
      {
	r = new Range(lo, hi, value, index);
	// ???
      }
    // Now: r != null && r.lo <= lo && (r.next == null || r.next.lo > lo).
    Range next = r.next;
    if (lo <= r.hi)
      {
	// Overlap: lo is within r.

	if (lo == r.lo || hi == r.hi)
	  {
	    r.value = value;
	    return;
	  }
	if (hi <= r.hi)
	  {
	    // Completely within r.
	    if (r.value == value)
	      return;
	    Range s = new Range(lo, hi, value, next);
	    if (hi != r.hi)
	      s.next = t = new Range(hi + 1, r.hi, r.value, next);
	    r.hi = lo - 1;
	    r.next = s;
	    // goto fixup;
	  }
      }
    else if (lo == r.hi + 1 && value == r.value)
      {
	r.hi = hi;
	if (next != null)
	  {
	    if (value == next.value && hi + 1 >= next.lo)
	      {
		r.next = next.next;
		r.hi = next.hi;
	      }
	    while (hi >= next.hi)
	      {
		r.hi = hi = next.hi;
		next = next.next;
		r.next = next;
	      }
	    if (hi >= next.lo)
	      {
		next.lo ...
	      }
	  }
      }
  }

  private static Range getLast(Object[] index, int key)
  {
    for (;;)
      {
	Object cur = index[key];
	if (cur == null)
	  {
	    if (key == 0)
	      return null;
	    key--;
	    continue;
	  }
	if (cur instanceof Range)
	  return (Range) cur;
	key = 15;
	index = (Object[]) cur;
      }
  }
  */

  /* Find last range r such that r.hi <= key. */
  /*
  public Range findLower(int key)
  {
    Object cur;
    int shift;
    //if ((key & 0xFF) == key)
    //  {
    //    // Optimize for key in range [0 .. 256].
    //    cur = index256;
    //    shift = 4;
    //  }
    //  else
      {
	cur = index;
	shift = 24;
	// FIXME - handle negative values in proper order?
	// key ^= 0x8000000;
      }
    for (;;)
      {
	int key15 = (key >> shift) & 0xF;
	cur = index[key15];
	if (cur == null)
	  return getLast(index, key15);
	if (cur instanceof Range)
	  return (Range) cur;
	shift -= 24;
      }

  }

  public Object lookup(int key, Object defaultValue)
  {
    Object cur;
    int shift;
    if ((key & 0xFF) == key)
      {
	// Optimize for key in range [0 .. 256].
	cur = index256;
	shift = 4;
      }
    else
      {
	cur = index;
	shift = 24;
	// FIXME - handle negative values in proper order?
	// key ^= 0x8000000;
      }
    for (;;)
      {
	cur = index[(key >> shift) & 0xF];
	if (cur == null)
	  return defaultValue;
	if (cur instanceof Range)
	  {
	    Range range = (Range) cur;
	    if (key <= range.lo && key >= range.hi)
	      return range.value;
	    else
	      return defaultValue;
	  }
	shift -= 24;
      }


  }

  public Object enumerate (RangeHandler handler)
  {
    for (Range range;  range != null; range = range.chain)
      {
	handel.handle(range.lo, range.hi, range.value);
      }
  }
  */
}

/*
interface RangeHandler
{
  public Object handle(int lo, int hi, Object value);
}

class Range
{
  int lo;
  int hi;
  Object value;
  Range next;

  public Range(int lo, int hi, Object value, Range next)
  {
    this.lo = lo;
    this.hi = hi;
    this.value = value;
    this.next = next;
  }
}
*/
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]