This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: support C/C++ identifiers named with non-ASCII characters


> From: <Paul.Koning@dell.com>
> CC: <zjz@zjz.name>, <gdb-patches@sourceware.org>
> Date: Mon, 21 May 2018 14:12:12 +0000
> 
> > Given unlimited time, would the right solution be to use a lib to parse the
> > string as utf-8, and reject strings that are not valid utf-8?
> 
> This sounds like a scenario where "stringprep" is helpful (or necessary).  It validates strings to be valid utf-8, can check that they obey certain rules (such as "word elements only" which rejects punctuation and the like), and can convert them to a canonical form so equal strings match whether they are encoded the same or not.

Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and
can not include invalid UTF-8 sequences?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]