This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Jim Blandy <jimb@red-bean.com> writes: > An Emacs buffer must hold large amounts of text, and must also serve as the > operand to editing and searching commands. It is terribly clumsy to > use a variable-length encoding in buffers. Why? It seems terribly elegant to use UTF-8 for buffers. The problem with variable-length encodings is that character indexing is not constant-time. But why would you need constant-time indexing for buffers? There are no common user-level operation which would require this. ELisp uses buffer indexes, but there is no inherent reason they need to be character indexes; they can byte indexes or some magic cookie instead. All most code ELisp code cares about is that buffer indexes are monotonically inreasing, and can be represented as fixnums. Code that is likely to break is anything that subtracts buffer indexes, and assumes the difference is related to the number of characters in the sub-range. There are probably a fair amount of places that would need to be fixed, but it is certainly a reasonable option. (And I gather this is what [FSF] Emacs 20 did. Perhaps Stallman is (partly) right after all ...) The other concerns about variable-with-encoded strings does not apply to buffers using a gap. If you replace a single-byte character with one that needs multiple-bytes, no problem - this is what we have a buffer gap for. Searching commands work fine on UTF-8. Plain (non-regexp) searching works fine with no change, as long as both the buffer and the search string are UTF-8. Reg-exp searching would require some hacking, since single-character patterns might take multiple bytes in the buffer. Note it is still possible to define a buffer as a sequence of *characters* (not bytes), as the XEmacs folks want, while using a variable-width encoding, and still allowing buffer indexes to be byte indexes. Whether this is the right engineering choice for Emacs is not obvious, but it certainly could work quite well. --Per Bothner Cygnus Solutions bothner@cygnus.com http://www.cygnus.com/~bothner