This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Outline for Guile Generational GC


Greg Harvey wrote:

> To keep the correspondance between what we're currently calling heap
> segments (larger chunks of contiguous memory), let's call them
> chunklets (which also has the benefit that, if they're implemented,
> and there are bugs, I can say 'Argh!  Chunklet fudge!').

> Actually, one of the reasons I went beyond a byte is that we probably
> should be keeping more than (8-16) possible generations. In the small
> case, it seems like overkill, but if we stop at such a small number,
> the last generation is likely to become large quickly. Also, since we
> are going to have extra info per object, it's a good idea to put the
> regular gc marks in there as well (we'll need two to flip flop, I
> think), so that we don't need to trot all over a bunch of heap
> segments removing marks to prevent guile from seeing illegal values (I
> learned that one the hard way ;).
>

OK, how about:struct Cell_Info {    int generation:8;
    int car_newer:1;
    int car_older:1;
    int cdr_newer:1;
    int cdr_older:1;
    int gc_mark:1;
    int hunoz:3;
    };

 struct Chunklet {   /*  sizeof(Chunklet) == 128 */
    Cell_Info cell_info[12];
    Cell cells[12];
    char any_out;
    char junk[7];
   };

Another possibility is 25 cells per chunklet, which wastes only 4 cells.
Basically pick some n such that 10*n + 2 is near but less than a power of 2.
The larger n is the more probable any_out will be set and you have to scan the
chunklet.

I briefly considered putting mark bits in the Cell_Info, but thought the less
disruption to the rest of guile the better.  It does
make unmarking in the sweep phase less disruptive, though.  Could you elaborate
on why a second mark bit might be needed?

> An improvement here is to keep at least some minimal information on a
> per segment basis (where we could point... it's not perfect, but
> avoids some unnecessary scanning). The part we really want to avoid is
> scans through any large amount of heap space, because this is almost
> always going to cost more than tracing live objects.
>

Well, if segments aren't too big we could keep a bit map of the any-out flags on
a segment basis.  That could greatly accelerate scanning a segment if there were
few pointers leading out it.  It would slow down the write barriersomewhat,
though.

> > We may prematurely
> > copy some babies created at the end of the cycle that will soon become
> > garbage.
>
>
> This can still be avoided by keeping a bit per object in the incubator
> (I like that :), and moving them on if the bit is set (means a bit
> more junk in the incubator, but we are talking about a very small
> space, that won't be too difficult to scan). Or a second incubator
> (probably better) that they have to live through to make it to the
> heap proper. This is definately worth the mem, since the overhead for
> an object that graduates is greater.
>

I think the second incubator is a better idea; leaving them in the first would
mean having to builda free list.

> 1) Normal 'scheme c' code will use something like SCM_ASSIGN(x, val),
>    rather than x=val.

What's SCM_ASSIGN?  My (rather outdated now) sources don't mention it.

> 2) User smobs will have to take special care when they can contain
>    pointers to the heap
>

Won't the required call to scm_gc_mark handle this?

> My inclination is to make all segments small (currently, a small
> multiple of the page size, though it can go to the page size
> itself).

I would agree with this.

> Would it be alright with you if I put up your mail on my guile page?
> There's certainly enough here that it will take a while for me to
> integrate the ideas with the notes (is that ok, as well?)
>

Go for it.  I'm happy to contribute any way I can.

Dale.