This is the mail archive of the guile@sources.redhat.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Language translation proposal


[Please reply to this message instead of the previous which had the
 wrong address of the Guile list.]

* Introduction

This is a proposal for how Guile could interface with language
translators.  It will be posted on the Guile list and revised for some
short time (days rather than weeks) before being implemented.

The document can be found in the CVS repository as
guile-core/devel/translation/lantools.text.  All Guile developers are
welcome to modify and extend it according to the ongoing discussion
using CVS.

Ideas and comments are welcome.

For clarity, the proposal is partially written as if describing an
already existing system.

MDJ 000812 <djurfeldt@nada.kth.se>

* Language names

A translator for Guile is a certain kind of Guile module, implemented
in Scheme, C, or a mixture of both.

To make things simple, the name of the language is closely related to
the name of the translator module.

Languages have long and short names.  The long form is simply the name
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
`(my-modules foo-lang)' etc.

Languages with the long name `(lang IDENTIFIER)' can be referred to
with the short name IDENTIFIER, for example `emacs-lisp'.

* How to tell Guile to read code in a different language (than Scheme)

There are four methods of specifying which translator to use when
reading a file:

** Command option

The options to the guile command are parsed linearly from left to
right.  You can change the language at zero or more points using the
option

 -t, --language LANGUAGE

Example:

  guile -t emacs-lisp -l foo -l bar -t scheme -l baz

will use the emacs-lisp translator while reading "foo" and "bar", and
the default translator (scheme) for "baz".

You can use this technique in a script together with the meta switch:

#!/usr/local/bin/guile \
-t emacs-lisp -s
!#

** Commentary in file

When opening a file for reading, Guile will read the first few lines,
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
either the long or short form of the name.

If found, the corresponding translator is loaded and used to read the
file.

** File extension

Guile maintains an alist mapping filename extensions to languages.
Each entry has the form:

  (REGEXP . LANGNAME)

where REGEXP is a string and LANGNAME a symbol or a list of symbols.

The alist can be accessed using `language-alist' which is exported
by the module `(core config)':

  (language-alist)			--> current alist
  (language-alist ALIST) 		sets the alist to ALIST
  (language-alist ALIST :prepend)	prepends ALIST onto the current list
  (language-alist ALIST :append)	appends ALIST after current list

The `load' command will match filenames against this alist and choose
the translator to use accordingly.

** Module header

The module header of the current module system is the form

  (define-module NAME OPTION1 ...)

You can specify a translator using the option

  :language LANGNAME

where LANGNAME is the long or short form of language name as described
above.

The translator is being fed characters from the module file, starting
immediately after the end-parenthesis of the module header form.

NOTE: There can be only one module header per file.

It is also possible to put the module header in a separate file and
use the option

  :file FILENAME

to point out a file containing the actual code.

Example:

foo.gm:
----------------------------------------------------------------------
(define-module (foo)
  :language emacs-lisp
  :file "foo.el"
  :export (foo bar)
  )
----------------------------------------------------------------------

foo.el:
----------------------------------------------------------------------
(defun foo ()
  ...)

(defun bar ()
  ...)
----------------------------------------------------------------------

* Language modules

A language module is an ordinary Guile module importing bindings from
other modules and exporting bindings through its public interface.

It is required to export the following procedures:

  language-environment --> ENVIRONMENT

    Returns a fresh top-level ENVIRONMENT (a module) where expressions
    in this language are evaluated by default.

    Modules using this language will by default have this environment
    on their use list.

    The intention is for this procedure to provide the "run-time
    environment" for the language.

  read-expression PORT --> EXPRESSION

    Read next expression in the foreign syntax from PORT and return an
    object EXPRESSION representing it.

    It is entirely up to the language module to define what one
    expression is.  The representation of EXPRESSION is also chosen by
    the language module.

    This procedure will be called during interactive use (the user
    types expressions at a prompt) and when the system `read'
    procedure is called when a module using this language is selected.

  translate EXPRESSION --> SCHEMECODE

    Translate an EXPRESSION into SCHEMECODE.

    EXPRESSION can be anything returned by `read-expression'.

    SCHEMECODE is Scheme source code represented using ordinary Scheme
    data.  It will be passed to `eval' in an environment containing
    bindings in the environment returned by `language-environment'.

    This procedure will be called duing interactive use and when the
    system `eval

  translate-all PORT --> THUNK

    Translate the entire stream of characters PORT until #<eof>.
    Return a THUNK which can be called repeatedly like this:

      THUNK --> SCHEMECODE

    Each call will yield a new piece of scheme code.  #f is returned
    to signal the end of the stream of scheme expressions.

    This procedure will be called by the system `load' command and by
    the module system when loading files.

    The intensions are:

    1. To let the language module decide when and in how large chunks
       to do the processing.  It may choose to do all processing at
       the time translate-all is called, all processing when THUNK is
       called the first time, or small pieces of processing each time
       THUNK is called, or any conceivable combination.

    2. To let the language module decide in how large chunks to output
       the resulting Scheme code in order not to overload memory.

    3. To enable the language module to use temporary files, and
       whole-module analysis and optimization techniques.

  untranslate SCHEMECODE --> EXPRESSION

    Attempt to do the inverse of `translate'.  An approximation is
    OK.  It is also OK to return #f.  This procedure will be called
    from the debugger, when generating error messages, backtraces etc.

* Error handling

** Errors during translation

Errors during translation are generated as usual by calling scm-error
(from Scheme) or scm_misc_error etc (from C).  The effect of
throwing errors from within `translate-all' is the same as when they
are generated within a call to the THUNK returned from
`translate-all'.

scm-error takes a fifth argument.  This is a property list (alist)
which you can use to pass extra information to the error reporting
machinery.

Currently, the following properties are supported:

  filename  filename of file being translated
  line	    line number of errring expression
  column    column number

** Run-time errors (errors in SCHEMECODE)

This section pertains to what happens when a run-time error occurs
during evaluation of the translated code.

In order to get "foreign code" in error messages, make sure that
`untranslate' yields good output.  Note the possibility of maintaining
a table (preferably using weak references) mapping SCHEMECODE to
EXPRESSION.

Note the availability of source-properties for attaching filename,
line and column number, and other, information, such as EXPRESSION, to
SCHEMECODE.  If filename, line, and, column properties are defined,
they will be automatically used by the error reporting machinery.

* Proposed changes to Guile

** Implement the above proposal.

* Add new field `reader' and `translator' to all module objects

Make sure they are initialized when a language is specified.

* Use `untranslate' during error handling.

* Implement the use of arg 5 to scm-error

(specified in "Errors during translation")

** Implement a generic lexical analyzer with interface similar to read/rp

Mikael is working on this.  (It might take a few days, since he is
busy with his studies right now.)

** Remove scm:eval-transformer

This is replaced by new fields in each module object (environment).

`eval' will instead directly the `transformer' field in the module
passed as second arg.

Internal evaluation will, similarly, use the transformer of the module
representing the top-level of the local environment.

Note that this level of transformation is something independent of
language translation.  *This* is a hook for adding Scheme macro
packages and belong to the core language.

We also need to check the new `translator' field, potentially using
it.

** Package local environments as smobs

so that environment list structures can't leak out on the Scheme
level.  (This has already been done in SCM.)

** Introduce "read-states" (symmetrical to "print-states")

These carries state information belonging to a read call chain, such
as which keyword syntax to support, whether to be case sensitive or
not, and, which lexical grammar to use.

** Move configuration of keyword syntax and case sensitivity to the read-state

Add new fields to the module objects for these values, so that the
read-state can be initialized from them.

  *fixme* When? Why? How?

Probably as soon as the language has been determined during file loading.

Need to figure out how to set these values.


Local Variables:
mode: outline
End:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]