This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: binary-io, opposable-thumb, pack/unpack (was Re: binary-io (was Re: rfc 2045 base64 encoding/decoding module))


> From: Per Bothner <per@bothner.com>
> Date: 15 Feb 2000 10:56:00 -0800
> 
> sen_ml@eccosys.com writes:
> 
> >   -ports are defined in terms of chars, and chars might not be
> >    fixed in width to 8-bits in the future.
> 
> The way I think it should work (at least this is what I'm doing in Kawa):
> 
> You need four kinds of (generic) ports:
> 
> byte-input-port:  Reads a sequence of 8-bit bytes
> char-input-port:  Reads a sequence of (wide) characters (e.g. Unicode).
> byte-output-port:  Writes a sequence of 8-bit bytes
> char-output-port:  Writes a sequence of (wide) characters (e.g. Unicode).
> 
> For compatibility and convenience, you want a procedure like read-char
> to accept either a byte-input-port or a char-input-port.  If the
> specified port is a char-input-port, the result should be a character,
> as if returns by (char->integer BYTE).  Similarly, write-char
> works on both byte-output-port and char-output-port.

I'm not sure I understand this proposal completely, since I don't see
what you gain by using two ports.  Wouldn't it be confusing to work
with, e.g., if you were reading a stream of arbitrary data, would you
read from one port some of the time to unpack bytes into Scheme and
then from the other whenever you expected a character?

It seems to me easier to consider an input port to be a source of
bytes, with read-char a procedure for unpacking bytes into characters.
To support multiple encodings, the port could have a "current
encoding" which could be changed at will (actually this is just to
avoid adding an extra incompatible argument to read-char.  An
alternative would be to let read-char default to a global locale
setting and add read-char/charset or something to specify variations.)

Individual characters are only part of the problem anyway: there's
also the custom of treating strings as byte arrays that would break.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]