This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: support C/C++ identifiers named with non-ASCII characters

From: Eli Zaretskii <eliz at gnu dot org>
To: <Paul dot Koning at dell dot com>
Cc: simark at simark dot ca, zjz at zjz dot name, gdb-patches at sourceware dot org
Date: Mon, 21 May 2018 19:12:37 +0300
Subject: Re: support C/C++ identifiers named with non-ASCII characters
References: <9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name> <8c8af079-dbb8-207b-5edf-86b99e9f5db8@simark.ca> <CF83AA8F-D3F8-446C-A078-252ADFB6D4C8@dell.com>
Reply-to: Eli Zaretskii <eliz at gnu dot org>

> From: <Paul.Koning@dell.com>
> CC: <zjz@zjz.name>, <gdb-patches@sourceware.org>
> Date: Mon, 21 May 2018 14:12:12 +0000
> 
> > Given unlimited time, would the right solution be to use a lib to parse the
> > string as utf-8, and reject strings that are not valid utf-8?
> 
> This sounds like a scenario where "stringprep" is helpful (or necessary).  It validates strings to be valid utf-8, can check that they obey certain rules (such as "word elements only" which rejects punctuation and the like), and can convert them to a canonical form so equal strings match whether they are encoded the same or not.

Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and
can not include invalid UTF-8 sequences?

Follow-Ups:
- Re: support C/C++ identifiers named with non-ASCII characters
  - From: Paul.Koning

References:
- support C/C++ identifiers named with non-ASCII characters
  - From: 張俊芝
- Re: support C/C++ identifiers named with non-ASCII characters
  - From: Simon Marchi
- Re: support C/C++ identifiers named with non-ASCII characters
  - From: Paul.Koning

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]