This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
I believe that subdirectory of Rx 1.5 is just the old GNU regexp, the one that's still used in Emacs. It's a well-coded traditional backtracking system, as opposed to a state-caching lazy nfa->dfa converter, which is what Rx is. I wouldn't mind using it, but I don't want to distribute a regexp with Guile that's likely to conflict with what's already there on the system. Credit where credit is due: I didn't write or design the old GNU regexp; that's (I believe) Mike Haertel and Stallman. Haertel did grep, too. GNU Grep uses a hierarchy of "false positive" tests (i.e., tests that may report a match where there is none, but will never fail to report a real match), which bottom out to the good old (but relatively slow) GNU backtracking regexp system. The advantage of the false positive tests is that, if you don't have to be 100% right, you can be much faster. For example, back references make regular expressions actually much more powerful than their name would suggest: (a*)b\1b\1 is actually a context-sensitive language. So you can't build a DFA (strictly speaking) for a Posix regexp. But you can build a DFA that recognizes a strict superset of a Posix regexp --- for example (a*)b(a*)b(a*) is a superset of the expression given above, and it's regular. It'll run quickly, but it'll accept stuff you don't want. So GNU grep runs the DFA, and if the DFA matches, then it runs the full Posix regexp matcher. I think grep actually does yet another test before the DFA, using Boyer-Moore to check for all constant strings in the pattern that it can prove must be part of any match.