This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

substrings (Re: Truely annoying regular expressions.)


> guile> (define parser (make-regexp "^[ \t]*([^ \t:]*):[ \t]*(.*[^ \t])[ \t]*$"))
> guile> (define m (regexp-exec parser "Concept: Joe learns guile"))
> guile> (define (capitalize! s)
>   (do ((i 1 (1+ i))
>        (l (string-length s)))
>       ((>= i l)
>        (if (>= l 1)
> 	   (string-set! s 0 (char-upcase (string-ref s 0))))
>        s)
>     (string-set! s i (char-downcase (string-ref s i)))))
> guile> (capitalize! (match:substring m 1))
> ERROR: In procedure string-ref in expression (string-ref s i):
> ERROR: Bad memory access (Segmentation violation)
> ABORT: (signal)

You can do it easier than that:

guile> (define s "Some long string")
guile> (define b (make-shared-substring s 5 9))
guile> (string-set! b 2 #\r)
guile> s
"Some long string"
guile> b
ERROR: Bad memory access (Segmentation violation)
"ABORT: (signal)

It seems to me that the problem is that (string-set!) cannot
cope with substrings -- it thinks it can but it trashes the
data structures in unpredictable ways. To typecheck its input
agruments, scm_string_set_x() uses the SCM_STRINGP macro
which in turn uses the SCM_TYP7S macro which masks the bit
that distinguishes a string from a substring.

Seems to me that an easy fix is to go to strings.c and replace:

SCM_ASSERT (SCM_NIMP (str) && SCM_STRINGP (str), str, SCM_ARG1, s_string_set_x);

with something like:

SCM_ASSERT( SCM_NIMP( str ) && ( SCM_TYP7S( str ) == scm_tc7_string ),
		str, SCM_ARG1, s_string_set_x );

and just document the limitation that (string-set!) can't cope with
any data of type substring.

By the way, from the point of view of debugging, the (tag) procedure
is quite useful since it returns an integer indicating the internal
type number of any data object. This allows you to create prediactes
like:

(define (substring? x) (= utag_substring (tag x)))

Some way of converting the integer type-tag into a string tag name
would also be handy at times (save looking in the source code).

I might finish up by pointing out the errors in the documentation
and suggesting changes:

[old]     (define foo "the quick brown fox")
[old]     (define bar (make-shared-substring some-string 4 9))
[new]     (define foo "the quick brown fox")
[new]     (define bar (make-shared-substring foo 4 9))

[old]     (string-length? bar) => 5  ; bar only extends from indices 4 to 9
[new]     (string-length bar) => 5  ; bar only extends from indices 4 to 9

Also a note should be added to say that using (string-set! bar 3 #\r)
will fail (because of the above typecheck) and is not allowed.

> 1. Why the call to make-shared-substring?

performance is better than (substring)

> 2. If I try to modify a "shared" string, shouldn't I get an error
>    message that's a little more illuminating than a seg fault?

You will get a typecheck error using the above fix.

	- Tel