This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
First-class environment proposal

To: Guile Discussion <guile@cygnus.com>
Subject: First-class environment proposal
From: Jim Blandy <jimb@red-bean.com>
Date: Wed, 10 Mar 1999 23:22:49 -0500

(No, I'm not dead...)

$Id: env.texi,v 1.4 1999/02/18 19:07:19 jimb Exp jimb $

   This is a draft proposal for a new datatype for representing
top-level environments in Guile.  Upon completion, this proposal will
be posted to the mailing list `guile@cygnus.com' for discussion,
revised in light of whatever insights it may produce, and eventually
implemented.

   Note that this is *not* a proposal for a module system; rather, it
is a proposal for a data structure which encapsulates the ideas one
when writing a module system, and, most importantly, a fixed interface
which insulates the interpreter from the details of the module system.
Using these environments, one could implement any module system one
pleased, without changing the interpreter.

   I hope this text will eventually become a chapter of the Guile
manual; thus, the description of environments in written in the present
tense, as if it were already implemented, not in the future tense.
However, this text does not actually describe the present state of
Guile.

   I'm especially interested in improving the vague, rambling
presentation of environments in the section "Modules and Environments".
I'm trying to orient the user for the discussion that follows, but I
wonder if I'm just confusing the issue.  I would appreciate suggestions
if they are concrete -- please provide new wording.

   Note also: I'm trying out a convention I'm considering for use in the
manual.  When a Scheme procedure which is directly implemented by a C
procedure, and both are useful to call from their respective languages,
we document the Scheme procedure only, and call it a "Primitive".  If a
Scheme function is marked as a primitive, you can derive the name of the
corresponding C function by changing `-' to `_', `!' to `_x', `?' to
`_p', and prepending `scm_'.  The C function's arguments will be all of
the Scheme procedure's argumements, both required and optional; if the
Scheme procedure takes a "rest" argument, that will be a final argument
to the C function.  The C function's arguments, as well as its return
type, will be `SCM'.  Thus, a procedure documented like this:

 - Primitive: set-car! PAIR VALUE

   has a corresponding C function which would be documented like this:

 - Libguile function: SCM scm_set_car_x (SCM PAIR, SCM VALUE)

   The hope is that this will be an uncluttered way to document both
the C and Scheme interfaces, without unduly confusing users interested
only in the Scheme level.

   When there is a C function which provides the same functionality as a
primitive, but with a different interface tailored for C's needs, it
usually has the same name as the primitive's C function, with the suffix
`_internal'.  Thus, `scm_env_ref_internal' is almost identical to
`scm_env_ref', except that it indicates an unbound variable in a manner
friendlier to C code.

   Copyright 1999 Free Software Foundation, Inc.

Top-Level Environments in Guile
*******************************

   In Guile, an environment is a mapping from symbols onto variables,
and a variable is a location containing a value.  Guile uses the
datatype described here to represent its top-level environments.

Modules and Environments
========================

   Guile distinguishes between environments and modules.  A module is a
unit of code sharing; it has a name, like `(math random)', an
implementation (e.g., Scheme source code, a dynamically linked library,
or a set of primitives built into Guile), and finally, an environment
containing the definitions which the module exports for its users.

   An environment, by contrast, is simply an abstract data type
representing a mapping from symbols onto variables which the Guile
interpreter uses to look up top-level definitions.  The `eval'
procedure interprets its first argument, an expression, in the context
of its second argument, an environment.

   Guile uses environments to implement its module system.  A module
created by loading Scheme code might be built from several environments.
In addition to the environment of exported definitions, such a module
might have an internal top-level environment, containing both exported
and private definitions, and perhaps environments for imported
definitions alone and local definitions alone.

   The interface described here includes a full set of functions for
mutating environments, and the system goes to some length to maintain
its consistency as environments' bindings change.  This is necessary
because Guile is an interactive system.  The user may create new
definitions or modify and reload modules while Guile is running; the
system should handle these changes in a consistent and predictable way.

   A typical Guile system will have several distinct top-level
environments.  (This is why we call them "top-level", and not
"global".)  For example, consider the following fragment of an
interactive Guile session:

     guile> (use-modules (ice-9 regex))
     guile> (define pattern "^(..+)\\1+$")
     guile> (string-match pattern "xxxx")
     #("xxxx" (0 . 4) (0 . 2))
     guile> (string-match pattern "xxxxx")
     #f
     guile>

Guile evaluates the expressions the user types in a top-level
environment reserved for that purpose; the definition of `pattern' goes
there.  That environment is distinct from the one holding the private
definitions of the `(ice-9 regex)' module.  At the Guile prompt, the
user does not see the module's private definitions, and the module is
unaffected by definitions the user makes at the prompt.  The
`use-modules' form copies the module's public bindings into the user's
environment.

   All Scheme evaluation takes place with respect to some top-level
environment.  Just as the procedure created by a `lambda' form closes
over any local scopes surrounding that form, it also closes over the
surrounding top-level environment.  Thus, since the `string-match'
procedure is defined in the `(ice-9 regex)' module, it closes over that
module's top-level environment.  Thus, when the user calls
`string-match' from the Guile prompt, any free variables in
`string-match''s definition are resolved with respect to the module's
top-level environment, not the user's.

   Although the Guile interaction loop maintains a "current" top-level
environment in which it evaluates the user's input, it would be
misleading to extend the concept of a "current top-level environment"
to the system as a whole.  Each procedure closes over its own top-level
environment, in which that procedure will find bindings for its free
variables.  Thus, the top-level environment in force at any given time
depends on the procedure Guile happens to be executing.  The global
"current" environment is a figment of the interaction loop's
imagination.

   Since environments provide all the operations the Guile interpreter
needs to evaluate code, they effectively insulate the interpreter from
the details of the module system.  Without changing the interpreter, you
can implement any module system you like, as long as its efforts produce
an environment object the interpreter can consult.

Common Environment Operations
=============================

   This section describes the common set of operations that all
environment objects support.  To create an environment object, or to
perform an operation specific to a particular kind of environment, see
*Note Standard Environment Types::.

   In this section, the following names for formal parameters imply that
the actual parameters must have a certain type:

ENV
     an environment

SYMBOL
     a symbol

PROC
     a procedure

VALUE
OBJECT
     an arbitrary Scheme value

Examining Environments
----------------------

 - Primitive: env? OBJECT
     Return `#t' if OBJECT is an environment, or `#f' otherwise.

 - Primitive: env-ref ENV SYMBOL
     Return the value of the location bound to SYMBOL in ENV.  If
     SYMBOL is unbound in ENV, signal an `env:unbound' error (*note
     Environment Errors::.).

 - Primitive: env-bound? ENV SYMBOL
     Return `#t' if SYMBOL is bound in ENV, or `#f' otherwise.

 - Primitive: env-fold ENV PROC INIT
     Iterate over all the bindings in an environment, accumulating some
     value.

     For each binding in ENV, apply PROC to the symbol bound, its
     value, and the result from the previous application of PROC.  Use
     INIT as PROC's third argument the first time PROC is applied.

     If ENV contains no bindings, this function simply returns INIT.

     If ENV binds the symbol SYM1 to the value VAL1, SYM2 to VAL2, and
     so on, then this procedure computes:
          (PROC SYM1 VAL1
                (PROC SYM2 VAL2
                      ...
                      (PROC SYMN VALN
                            INIT)))

     Each binding in ENV will be processed exactly once.  `env-fold'
     makes no guarantees about the order in which the bindings are
     processed.

     Here is a function which, given an environment, constructs an
     association list representing that environment's bindings, using
     `env-fold':
          (define (env->alist env)
            (env-fold env
                      (lambda (sym val tail)
                        (cons (cons sym val) tail))
                      '()))

 - Libguile macro: int SCM_ENVP (OBJECT)
     Return non-zero iff OBJECT is an environment.

 - Libguile function: SCM scm_env_ref_internal (SCM ENV, SCM SYMBOL)
     This C function is identical to `env-ref', except that if SYMBOL
     is unbound in ENV, it returns the value `SCM_UNDEFINED', instead
     of signalling an error.

 - Libguile function: SCM scm_env_fold_internal (SCM ENV,
          scm_env_folder *PROC, SCM DATA, SCM INIT)
     This is the C-level analog of `env-fold'.  For each binding in
     ENV, make the call:
          (*PROC) (DATA, SYMBOL, VALUE, PREVIOUS)

     where PREVIOUS is the value returned from the last call to
     `*PROC', or INIT for the first call.  If ENV contains no bindings,
     return INIT.

 - Libguile data type: scm_env_folder SCM (SCM DATA, SCM SYMBOL, SCM
          VALUE, SCM TAIL)
     The type of a folding function to pass to `scm_env_fold_internal'.

Changing Environments
---------------------

   Here are functions for changing symbols' bindings and values.

   Although it is common to say that an environment binds a symbol to a
value, this is not quite accurate; an environment binds a symbol to a
location, and the location contains a value.  In the descriptions below,
we will try to make clear how each function affects bindings and
locations.

   Note that some environments may contain some immutable bindings, or
may bind symbols to immutable locations.  If you attempt to change an
immutable binding or value, these functions will signal an
`env:immutable-binding' or `env:immutable-location' error.  However,
simply because a binding cannot be changed via these functions does
*not* imply that it is constant.  Mechanisms outside the scope of this
section (say, re-loading a module's source code) may change a binding
or value which is immutable via these functions.

 - Primitive: env-define ENV SYMBOL VALUE
     Bind SYMBOL to a new location containing VALUE in ENV.  If SYMBOL
     is already bound to another location in ENV, that binding is
     replaced.  The new binding and location are both mutable.  The
     return value is unspecified.

     If SYMBOL is already bound in ENV, and the binding is immutable,
     signal an `env:immutable-binding' error.

 - Primitive: env-undefine ENV SYMBOL
     Remove any binding for SYMBOL from ENV.  If SYMBOL is unbound in
     ENV, do nothing.  The return value is unspecified.

     If SYMBOL is already bound in ENV, and the binding is immutable,
     signal an `env:immutable-binding' error.

 - Primitive: env-set! ENV SYMBOL VALUE
     If ENV binds SYMBOL to some location, change that location's value
     to VALUE.  The return value is unspecified.

     If SYMBOL is not bound in ENV, signal an `env:unbound' error.  If
     ENV binds SYMBOL to an immutable location, signal an
     `env:immutable-location' error.

Caching Environment Lookups
---------------------------

   Some applications refer to variables' values so frequently that the
overhead of `env-ref' and `env-set!' is unacceptable.  For example,
variable reference speed is a critical factor in the performance of the
Guile interpreter itself.  If an application can tolerate some
additional complexity, the `env-cell' function described here can
provide very efficient access to variable values.

   In Guile, most variables are represented by pairs; the CDR of the
pair holds the variable's value.  Thus, a variable reference corresponds
to taking the CDR of one of these pairs, and setting a variable
corresponds to a `set-cdr!' operation.  A pair used to represent a
variable's value in this manner is called a "value cell".  Value cells
represent the "locations" to which environments bind symbols.

   The `env-cell' function returns the value cell bound to a symbol.
For example, an interpreter might make the call `(env-cell ENV SYMBOL
#f)' to find the value cell which ENV binds to SYMBOL, and then use
`cdr' and `set-cdr!' to reference and assign to that variable, instead
of calling `env-ref' or ENV-SET! for each variable reference.

   There are a few caveats that apply here:

   * Environments are not required to represent variables' values using
     value cells.  An environment is free to return `#f' in response to
     a request for a symbol's value cell; in this case, the caller must
     use `env-ref' and `env-set!' to manipulate the variable.

   * An environment's binding for a symbol may change.  For example,
     the user could override an imported variable with a local
     definition, associating a new value cell with that symbol.  If an
     interpreter has used `env-cell' to obtain the variable's value
     cell, it no longer needs to use `env-ref' and `env-set!' to access
     the variable, and it may not see the new binding.

     Thus, code which uses `env-cell' should almost always use
     `env-observe' to track changes to the symbol's binding; this is the
     additional complexity hinted at above.  *Note Observing Changes to
     Environments::.

   * Some variables should be immutable.  If a program uses `env-cell'
     to obtain the value cell of such a variable, then it is impossible
     for the environment to prevent the program from changing the
     variable's value, using `set-cdr!'.  However, this is discouraged;
     it is probably better to redesign the interface than to disregard
     such a request.  To make it easy for programs to honor the
     immutability of a variable, `env-cell' takes an argument
     indicating whether the caller intends to mutate the cell's value;
     if this argument is true, then `env-cell' signals an
     `env:immutable-location' error.

     Programs should therefore make separate calls to `env-cell' to
     obtain value cells for reference and for assignment.  It is
     incorrect for a program to call `env-cell' once to obtain a value
     cell, and then use that cell for both reference and mutation.

 - Primitive: env-cell ENV SYMBOL FOR-WRITE
     Return the value cell which ENV binds to SYMBOL, or `#f' if the
     binding does not live in a value cell.

     The argument FOR-WRITE indicates whether the caller intends to
     modify the variable's value by mutating the value cell.  If the
     variable is immutable, then `env-cell' signals an
     `env:immutable-location' error.

     If SYMBOL is unbound in ENV, signal an `env:unbound' error.

     If you use this function, you should consider using `env-observe',
     to be notified when `symbol' gets re-bound to a new value cell, or
     becomes undefined.

 - Libguile function: SCM scm_env_cell_internal (SCM ENV, SCM SYMBOL,
          int for_write)
     This C function is identical to `env-cell', except that if SYMBOL
     is unbound in ENV, it returns the value `SCM_UNDEFINED', instead
     of signalling an error.

   [[After we have some experience using this, we may find that we want
to be able to explicitly ask questions like, "Is this variable mutable?"
without the annoyance of error handling.  But maybe this is fine.]]

Observing Changes to Environments
---------------------------------

   The procedures described here allow you to add and remove "observing
procedures" for an environment.

   A program may register an "observing procedure" for an environment,
which will be called whenever a binding in a particular environment
changes.  For example, if the user changes a module's source code and
re-loads the module, other parts of the system may want to throw away
information they have cached about the bindings of the older version of
the module.  To support this, each environment retains a set of
observing procedures which it will invoke whenever its bindings change.
We say that these procedures "observe" the environment's bindings.  You
can register new observing procedures for an environment using
`env-observe'.

 - Primitive: env-observe ENV PROC
     Whenever ENV's bindings change, apply PROC to ENV.

     This function returns an object, TOKEN, which you can pass to
     `env-unobserve' to remove PROC from the set of procedures
     observing ENV.  The type and value of TOKEN is unspecified.

 - Primitive: env-unobserve TOKEN
     Cancel the observation request which returned the value TOKEN.
     The return value is unspecified.

     If a call `(env-observe ENV PROC)' returns TOKEN, then the call
     `(env-unobserve TOKEN)' will cause PROC to no longer be called
     when ENV's bindings change.

   There are some limitations on observation:
   * These procedures do not allow you to observe specific bindings; you
     can only observe an entire environment.

   * These procedures observe bindings, not locations.  There is no way
     to receive notification when a location's value changes, using
     these procedures.

   * These procedures do not promise to call the observing procedure
     for each individual binding change.  However, if multiple bindings
     do change between calls to the observing procedure, those changes
     will appear atomic to the entire system.

   * Since a single environment may have several procedures observing
     it, a correct design obviously may not assume that nothing else in
     the system has yet observed a given change.

   When writing observing procedures, pay close attention to garbage
collection issues.  If you use `env-observe' to register observing
procedures for an environment, the environment will hold a reference to
those procedures; while that environment is alive, its observing
procedures will live, as will any data they close over.  If this is not
appropriate, you can use the `env-observe-weak' procedure to create a
weak reference from the environment to the observing procedure.

   For example, suppose an interpreter uses `env-cell' to reference
variables efficiently, as described above in *Note Caching Environment
Lookups::.  That interpreter must register observing procedures to track
changes to the environment.  If those procedures retain any reference to
the data structure representing the program being interpreted, then that
structure cannot be collected as long as the observed environment lives.
This is almost certainly incorrect -- if there are no other references
to the structure, it can never be invoked, so it should be collected.
In this case, the interpreter should register its observing procedure
using `env-observe-weak', and retain a pointer to it from the code it
updates.  Thus, when the code is no longer referenced elsewhere in the
system, the weak link will be broken, and Guile will collect the code
(and its observing procedure).

 - Primitive: env-observe-weak ENV PROC
     This function is the same as `env-observe', except that the
     reference ENV retains to PROC is a weak reference.  This means
     that, if there are no other live, non-weak references to PROC, it
     will be garbage-collected, and dropped from ENV's list of
     observing procedures.

   It is also possible to write code that observes an environment in C.
The `scm_env_observe_internal' function registers a C function to
observe an environment.  The typedef `scm_env_observer' is the type a C
observer function must have.

 - Libguile function: SCM scm_env_observe_internal (SCM ENV,
          scm_env_observer *proc, SCM DATA, int weak_p)
     This is the C-level analog of the Scheme function `env-observe'.
     Whenever ENV's bindings change, call the function PROC, passing it
     ENV and DATA.  If WEAK_P is non-zero, ENV will retain only a weak
     reference to DATA, and if DATA is garbage collected, the entire
     observation will be dropped.

     This function returns a token, with the same meaning as those
     returned by ENV-OBSERVE.

 - Libguile data type: scm_env_observer void (SCM ENV, SCM DATA)
     The type for observing functions written in C.  A function meant
     to be passed to `scm_env_internal_observe' should have the type
     `scm_env_observer'.

   Note that, like all other primitives, `env-observe' is also
available from C, under the name `scm_env_observe'.

Environment Errors
------------------

   Here are the error conditions signalled by the environment routines
described above.  In these conditions, FUNC is a string naming a
particular procedure.

 - Condition: env:unbound FUNC MESSAGE ARGS ENV SYMBOL
     By calling FUNC, the program attempted to retrieve the value of
     SYMBOL in ENV, but SYMBOL is unbound in ENV.

 - Condition: env:immutable-binding FUNC MESSAGE ARGS ENV SYMBOL
     By calling FUNC, the program attempted to change the binding of
     SYMBOL in ENV, but that binding is immutable.

 - Condition: env:immutable-location FUNC MESSAGE ARGS ENV SYMBOL
     By calling FUNC, the program attempted to change the value of the
     location to which SYMBOL is bound in ENV, but that location is
     immutable.

Standard Environment Types
==========================

   Guile supports several different kinds of environments.  The
operations described above are actually only the common functionality
provided by all the members of a family of environment types, each
designed for a separate purpose.

   Each environment type has a constructor procedure for building
elements of that type, and extends the set of common operations with
its own procedures, providing specialized functions.  For an example of
how these environment types work together, see *Note Modules of
Interpreted Scheme Code::.

   Guile allows users to define their own environment types.  Given a
set of procedures that implement the common environment operations,
Guile will construct a new environment object based on those procedures.

Finite Environments
-------------------

   A "finite" environment is simply a mutable set of definitions.  A
mutable environment supports no operations beyond the common set.

 - Primitive: make-finite-env
     Create a new finite environment, containing no bindings.  All
     bindings and locations in the new environment are mutable.

 - Primitive: finite-env? OBJECT
     Return `#t' if OBJECT is a finite environment, or #F otherwise.

   In Guile, each module of interpreted Scheme code uses a finite
environment to hold the definitions made in that module.

Eval Environments
-----------------

   A module's source code refers to definitions imported from other
modules, and definitions made within itself.  An "eval" environment
combines two environments -- a "local" environment and an "imported"
environment -- to produce a new environment in which both sorts of
references can be resolved.

 - Primitive: make-eval-env LOCAL IMPORTED
     Return a new environment object EVAL whose bindings are the union
     of those in LOCAL and IMPORTED, both environments, with bindings
     from LOCAL taking precedence.  Definitions made in EVAL are placed
     in LOCAL.

     That is, EVAL binds SYMBOL to LOCATION iff LOCAL does, or SYMBOL
     is unbound in LOCAL, and IMPORTED binds SYMBOL to LOCATION.

     Applying `env-define' or `env-undefine' to EVAL has the same
     effect as applying the procedure to LOCAL.

     Note that EVAL incorporates LOCAL and IMPORTED *by reference* --
     if, after creating EVAL, the program changes the bindings of LOCAL
     or IMPORTED, those changes will be visible in EVAL.

     Since most Scheme evaluation takes place in EVAL environments,
     they transparenty cache the bindings received from LOCAL and
     IMPORTED.  Thus, the first time the program looks up a symbol in
     EVAL, EVAL may make calls to LOCAL or IMPORTED to find their
     bindings, but subsequent references to that symbol will be as fast
     as references to bindings in finite environments.

     In typical use, LOCAL will be a finite environment, and IMPORTED
     will be an import environment, described below.

 - Primitive: eval-env? OBJECT
     Return `#t' if OBJECT is an eval environment, or `#f' otherwise.

 - Primitive: eval-env-local ENV
 - Primitive: eval-env-imported ENV
     Return the LOCAL or IMPORTED environment of ENV; ENV must be an
     eval environment.

Import Environments
-------------------

   An "import" environment combines the bindings of a set of argument
environments, and checks for naming clashes.

 - Primitive: make-import-env IMPORTS CONFLICT-PROC
     Return a new environment IMP whose bindings are the union of the
     bindings from the environments in IMPORTS; IMPORTS must be a list
     of environments.  That is, IMP binds SYMBOL to LOCATION iff some
     element of IMPORTS does.

     If two different elements of IMPORTS have a binding for the same
     symbol, apply CONFLICT-PROC to the two environments.  If the
     bindings of any of the IMPORTS ever changes, check for conflicts
     again.

     All bindings in IMP are immutable.  If you apply`env-define' or
     `env-undefine' to IMP, Guile will signal an
     `env:immutable-binding' error.  However, notice that the set of
     bindings in IMP may still change, if one of its imported
     environments changes.

 - Primitive: import-env? OBJECT
     Return `#t' if OBJECT is an import environment, or `#f' otherwise.

 - Primitive: import-env-imports ENV
     Return the list of ENV's imported environments; ENV must be an
     import env.

 - Primitive: import-env-set-imports! ENV IMPORTS
     Change ENV's list of imported environments to IMPORTS, and check
     for conflicts.

   I'm not at all sure about the way CONFLICT-PROC works.  I think
module systems should warn you if it seems you're likely to get the
wrong binding, but exactly how and when those warnings should be
generated, I don't know.

Export Environments
-------------------

   An export environment restricts an environment a specified set of
bindings.

 - Primitive: make-export-env PRIVATE SIGNATURE
     Return a new environment EXP containing only those bindings in
     PRIVATE whose symbols are present in SIGNATURE.  The PRIVATE
     argument must be an environment.

     The environment EXP binds SYMBOL to LOCATION iff ENV does, and
     SYMBOL is exported by SIGNATURE.

     SIGNATURE is a list specifying which of the bindings in PRIVATE
     should be visible in EXP.  Each element of SIGNATURE should be a
     list of the form:
          (SYMBOL ATTRIBUTE ...)

     where each ATTRIBUTE is one of the following:
    the symbol `mutable-location'
          EXP should treat the location bound to SYMBOL as mutable.
          That is, EXP will pass calls to ENV-SET! or `env-cell'
          directly through to PRIVATE.

    the symbol `immutable-location'
          EXP should treat the location bound to SYMBOL as immutable.
          If the program applies `env-set!' to EXP and SYMBOL, or calls
          `env-cell' to obtain a writable value cell, `env-set!' will
          signal an `env:immutable-location' error.

          Note that, even if an export environment treats a location as
          immutable, the underlying environment may treat it as
          mutable, so its value may change.

     It is an error for an element of SIGNATURE to specify both
     `mutable-location' and `immutable-location'.  If neither is
     specified, `immutable-location' is assumed.

     As a special case, if an element of SIGNATURE is a lone symbol
     SYM, it is equivalent to an element of the form `(SYM)'.

     All bindings in EXP are immutable.  If you apply`env-define' or
     `env-undefine' to EXP, Guile will signal an
     `env:immutable-binding' error.  However, notice that the set of
     bindings in EXP may still change, if the bindings in PRIVATE
     change.

 - Primitive: export-env? OBJECT
     Return `#t' if OBJECT is an export environment, or `#f' otherwise.

 - Primitive: export-env-private ENV
 - Primitive: export-env-set-private! ENV
 - Primitive: export-env-signature ENV
 - Primitive: export-env-set-signature! ENV
     Accessors and mutators for the private environment and signature of
     ENV; ENV must be an export environment.

Implementing Environments
=========================

   This section describes how to implement new environment types in
Guile.

   Guile's internal representation of environments allows you to extend
Guile with new kinds of environments without modifying Guile itself.
Every environment object carries a pointer to a structure of pointers to
functions implementing the common operations for that environment.  The
procedures `env-ref', `env-set!', etc. simply find this structure and
invoke the appropriate function.

Environment Function Tables
---------------------------

   An environment object is a smob whose CDR is a pointer to a pointer
to a `struct env_funcs':
     struct env_funcs {
       SCM  (*ref) (SCM self, SCM symbol);
       SCM  (*fold) (SCM self, scm_env_folder *proc, SCM data, SCM init);
       void (*define) (SCM self, SCM symbol, SCM value);
       void (*undefine) (SCM self, SCM symbol);
       void (*set) (SCM self, SCM symbol, SCM value);
       SCM  (*cell) (SCM self, SCM symbol, int for_write);
       SCM  (*observe) (SCM self, scm_env_observer *proc, SCM data, int weak_p);
       void (*unobserve) (SCM self, SCM token);
       SCM  (*mark) (SCM self);
       scm_sizet (*free) (SCM self);
       int  (*print) (SCM self, SCM port, scm_print_state *pstate);
     };

   You can use the following macro to access an environment's function
table:

 - Libguile macro: struct env_funcs *SCM_ENV_FUNCS (ENV)
     Return a pointer to the `struct env_func' for the environment ENV.
     If ENV is not an environment object, the behavior of this macro
     is undefined.

   Here is what each element of ENV_FUNCS must do to correctly
implement an environment.  In all of these calls, SELF is the
environment whose function is being invoked.

`SCM ref (SCM SELF, SCM SYMBOL);'
     This function must have the effect described above for the C call:
          scm_env_ref_internal (SELF, SYMBOL)
     *Note Examining Environments::.

     Note that the `ref' element of a `struct env_funcs' may be zero if
     a `cell' function is provided.

`SCM fold (SCM self, scm_env_folder *proc, SCM data, SCM init);'
     This function must have the effect described above for the C call:
          scm_env_fold_internal (SELF, PROC, DATA, INIT)
     *Note Examining Environments::.

`void define (SCM self, SCM symbol, SCM value);'
     This function must have the effect described above for the Scheme
     call:
          (env-define SELF SYMBOL VALUE)
     *Note Changing Environments::.

`void undefine (SCM self, SCM symbol);'
     This function must have the effect described above for the Scheme
     call:
          (env-undefine SELF SYMBOL)
     *Note Changing Environments::.

`void set (SCM self, SCM symbol, SCM value);'
     This function must have the effect described above for the Scheme
     call:
          (env-set! SELF SYMBOL VALUE)
     *Note Changing Environments::.

     Note that the `set' element of a `struct env_funcs' may be zero if
     a `cell' function is provided.

`SCM cell (SCM self, SCM symbol, int for_write);'
     This function must have the effect described above for the C call:
          scm_env_cell_internal (SELF, SYMBOL)
     *Note Caching Environment Lookups::.

`SCM observe (SCM self, scm_env_observer *proc, SCM data, int weak_p);'
     This function must have the effect described above for the C call:
          scm_env_observe_internal (ENV, PROC, DATA, WEAK_P)
     *Note Observing Changes to Environments::.

`void unobserve (SCM self, SCM token);'
     Cancel the request to observe SELF that returned TOKEN.  *Note
     Observing Changes to Environments::.

`SCM mark (SCM self);'
     Set the garbage collection mark all Scheme cells referred to by
     SELF.  Assume that SELF itself is already marked.  Return a final
     object to be marked recursively.

`scm_sizet free (SCM self);'
     Free all non-cell storage associated with SELF; return the number
     of bytes freed that were obtained using `scm_must_malloc' or
     `scm_must_realloc'.

`SCM print (SCM self, SCM port, scm_print_state *pstate);'
     Print an external representation of SELF on PORT, passing PSTATE
     to any recursive calls to the object printer.

Environment Data
----------------

   When you implement a new environment type, you will likely want to
associate some data of your own design with each environment object.
Since ANSI C promises that casts will safely convert between a pointer
to a structure and a pointer to its first element, you can have the CDR
of an environment smob point to your structure, as long as your
structure's first element is a pointer to a `struct env_funcs'.  Then,
your code can use the macro below to retrieve a pointer to the
structure, and cast it to the appropriate type.

 - Libguile macro: struct env_funcs **SCM_ENV_DATA (ENV)
     Return the CDR of ENV, as a pointer to a pointer to an `env_funcs'
     structure.

Environment Example
-------------------

   [[perhaps a simple environment based on association lists]]

Switching to Environments
=========================

   Here's what we'd need to do to today's Guile to install the system
described above.  This work would probably be done on a branch, because
it involves crippling Guile while a lot of work gets done.  Also, it
could change the default set of bindings available pretty drastically,
so the next minor release should not contain these changes.

   After each step here, we should have a Guile that we can at least
interact with, perhaps with some limitations.

   * For testing purposes, make an utterly minimal version of
     `boot-9.scm': no module system, no R4RS, nothing.  I think a simple
     REPL is all we need.

   * Implement the environment datatypes in libguile, and test them
     using this utterly minimal system.

   * Change the interpreter to use the `env-cell' and `env-observe'
     instead of the symbol value slots, first-class variables, etc.
     Modify the rest of libguile as necessary to register all the
     primitives in a single environment.  We'll segregate them into
     modules later.

   * Reimplement the current module system in terms of environments.  It
     should still be in Scheme.

   * Reintegrate the rest of `boot-9.scm'.  This might be a good point
     to move it into modules.

   * Do some profiling and optimization.

   Once this is done, we can make the following simplifications to
Guile:

   * A good portion of symbols.c can go away.  Symbols no longer need
     value slots.  The mismash of `scm_sym2ovcell',
     `scm_intern_obarray_soft', etc. can go away.  `intern' becomes
     simpler.

   * Remove first-class variables: `variables.c' and `variables.h'.

   * Organize the primitives into environments.

   * The family of environment types is clearly an abstract
     class/concrete subclass arrangement.  We should provide GOOPS
     classes/metaclasses that make defining new environment types easy
     and consistent.

Modules
*******

   The material here is just a sketch.  Don't take it too seriously.
The point is that environments allow us to experiment without getting
tangled up with the interpreter.

Modules of Guile Primitives
===========================

Modules of Interpreted Scheme Code
==================================

   If a module is implemented by interpreted Scheme code, Guile
represents it using several environments:

the "local" environment
     This environment holds all the definitions made locally by the
     module, both public and private.

the "import" environment
     This environment holds all the definitions this module imports from
     other modules.

the "evaluation" environment
     This is the environment in which the module's code is actually
     evaluated, and the one closed over by the module's procedures, both
     public and private.  Its bindings are the union of the LOCAL and
     IMPORT environments, with local bindings taking precedence.

the "exported" environment
     This environment holds the module's public definitions.  This is
     the only environment that the module's users have access to.  It
     is the EVALUATION environment, restricted to the set of exported
     definitions.

   Each of these environments is implemented using a separate
environment type.  Some of these types, like the evaluation and import
environments, actually just compute their bindings by consulting other
environments; they have no bindings in their own right.  They implement
operations like `env-ref' and `env-define' by passing them through to
the environments from which they are derived.  For example, the
evaluation environment will pass definitions through to the local
environment, and search for references and assignments first in the
local environment, and then in the import environment.
Follow-Ups:
- Re: First-class environment proposal
  - From: Per Bothner <bothner@cygnus.com>
- Re: First-class environment proposal
  - From: Maciej Stachowiak <mstachow@alum.mit.edu>
- Re: First-class environment proposal
  - From: roland.kaufmann@space.at
- First-class environment proposal
  - From: Klaus Schilling <Klaus.Schilling@home.ivm.de>
- Re: First-class environment proposal
  - From: Dirk Herrmann <dirk@ida.ing.tu-bs.de>
- Re: First-class environment proposal
  - From: Lalo Martins <lalo@webcom.com>
- Re: First-class environment proposal
  - From: Marius Vollmer <mvo@zagadka.ping.de>
- Re: First-class environment proposal
  - From: cwitty@newtonlabs.com (Carl R. Witty)
- First-class environment proposal
  - From: thi <ttn@mingle.glug.org>
- Re: First-class environment proposal
  - From: Greg Badros <gjb@cs.washington.edu>
- Re: First-class environment proposal
  - From: Mikael Djurfeldt <mdj@nada.kth.se>
- Re: First-class environment proposal
  - From: Jost Boekemeier <jostobfe@calvados.zrz.TU-Berlin.DE>
Prev by Date: Re: Guile project submissions draft
Next by Date: Re: First-class environment proposal
Prev by thread: Re: Guile dev sources
Next by thread: Re: First-class environment proposal
Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]