This is the mail archive of the
kawa@sources.redhat.com
mailing list for the Kawa project.
Re: volunteer project: srfi 14 (char-sets)
Jim White wrote:
I intend to take a crack at this, but I don't want to use bitmaps.
Bitmaps are not appropriate - thwy'd need up to 2*20 bits for the
multiple Unicode planes. That's why I sugested in my sketch
an implementation based on sorted ranges. Logarithmic search;
linear update including set operations and copy.
Of course they're inefficient for an Ascii-only set that has many
short ranges - but how often do people need that? If that's an
issue, we can have a third concrete implementation type that uses
bit vectors. Or we could have a hybrid implemenmtation: Use one
bit in the starting-index of a pair to indicate that the following
int is a 32-bit mask, instead of a range end. Then we're only
taking at most twice as much space as bit vector implementation,
though we still have to do binary search to find the correct
(start, mask) pair. And update operations become more complex.
--
--Per Bothner
per@bothner.com http://per.bothner.com/