This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: conservative gc



The other explanations are perfectly good; but here's mine:

Any garbage collector needs to know which objects a given object
refers to.  If O is an object, let REFS(O) be the set of objects it
refers to.  So, if O is a pair, then REFS(O) contains exactly (car O)
and (cdr O).  If O is a closure, then REFS(O) is the variables it has
closed over.

Clearly, in order to compute REFS(O), you generally need to know the
type of O, and its layout.  However, sometimes it's very difficult to
know the layout of an object.  For example, suppose O is a
continuation or a thread; in Guile, these objects actually include C
stacks, laid out by the C compiler.  The C compiler doesn't even let
you walk stack frames, let alone tell you which slots are references
to Scheme objects.

A conservative garbage collector is one which, for at least some
datatypes, uses a conservative estimate of REFS.  For example, Guile
does this for continuations and threads.  It treats the stack as a
dumb array of bytes, and then scans it for any word which looks like a
reference to a Scheme object.  It may mistake integers or frame
pointers or random binary data for a Scheme object, but it will
certainly catch all the valid references.  For correctness, this is
all the GC needs.

Guile is designed in certain ways to make this process work better.
First, it only uses conservative scanning on objects where it can't do
better.  All ordinary Scheme objects are scanned precisely.  Secondly,
the representation of Scheme objects was chosen to make it efficient
to recognize pointers to actual objects.

The benefit is convenience; you can write ordinary C code which
manipulates Scheme objects, and the collector will just find all your
references, and keep the objects alive for you.  There are GC'd
systems which require you to explicitly record your heap references;
this is, effectively, describing the layout of your stack at run time.
I've worked extensively on one such system, Emacs.  It was an utter
pain to maintain; even Richard Stallman couldn't do it, and we had
GC-related bugs in every release.  Given the reality of human
maintainers, conservative GC is more reliable than Emacs's scheme.

There is a price, however.  Because you're not *sure* whether this
word you found is really a pointer to a Scheme object, you can't
relocate that Scheme object, because you can't fix up that pointer.
It might actually be part of a string, or an integer, or encrypted
data.  There are GC's out there that pin individual objects and
relocate all the rest (e.g., Joel Bartlett's "Mostly Copying"
collector).  Guile's GC doesn't do that.

There is also a risk.  Your compiler would be perfectly within its
rights to save all pointers on the stack exclusive-or'd with
0xdeadbeef, and only xor them into their natural state after bringing
them into registers.  More realistically, your compiler might store
pointers offset by some constant; there are plausible cases where this
can generate pointers outside the object entirely.  These `pointer
hiding' optimizations do not seem to be a serious problem in practice,
although they do mean that conservative GC is `incorrect', in some
sense.