This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> There are a lot of languages which are not yet > representable. Among them are several Asian languages which might > again occupy a lot of room (I don't know this for sure). Almost half of the "General Scripts" area is still unassigned, so there is quite a bit of room for (non-ideographic) scripts (at least 3000 characters). > Well, applications certainly could handle this. But you must see that > if surrogates are possible in the "wide strings" these are not anymore > wide strings and all the string handling functions must be changed to > handle surrogates. Not necessarily. String handling functions can treat surrogates as regular uninterpreted characters, just the way they treat accents as uninterpreted characters. I.e. I'm proposing that string handling routines should not treat either surrogates or accents specially, but treat them as plain characters. This includes the standard string-handling functions of Scheme, Lisp, Java, and C/C++. Instead, they should leave the job to higher-level software, which may need to handle ligatures, accents, language, hyphenation, surrogates, font substitution, etc, etc. (Also software that translates from Unicode to some other encoding cannot in general look at individual characters in isolation. They too may have to consider multiple Unicode characters as a unit.) > (Plus handling of 16bit values is on many platforms slower than > reading 32bit values.) But usually not. Consider that memory tends to be the bottle-neck in modern processors, and most string handling is sequential. Anything that means you need twice as much memory will hurt your data cache and paging system. --Per Bothner Cygnus Solutions bothner@cygnus.com http://www.cygnus.com/~bothner