This is the mail archive of the
glibc-bugs-regex@sourceware.org
mailing list for the glibc project.
[Bug regex/1278] regex undefined behavior with shifting past word length
- From: "eggert at gnu dot org" <sourceware-bugzilla at sources dot redhat dot com>
- To: glibc-bugs-regex at sources dot redhat dot com
- Date: 2 Sep 2005 23:17:22 -0000
- Subject: [Bug regex/1278] regex undefined behavior with shifting past word length
- References: <20050831193645.1278.eggert@gnu.org>
- Reply-to: sourceware-bugzilla at sources dot redhat dot com
------- Additional Comments From eggert at gnu dot org 2005-09-02 23:17 -------
Andreas is right. For example, "unsigned long int x = ~0u;" will not
have an all-1s value on most 64-bit hosts.
In this particular hunk, ~0u would also work since the destination
type is unsigned short int. So if you'd really rather use ~0u I
guess that would be OK. However, as a style matter, it is confusing
to use ~0u in some unsigned contexts, while using -1 in other unsigned
contexts. Since -1 always works, it's more consistent to use it in
all unsigned contexts.
For example, suppose someone later changes eps_reachable_subexps_map
from unsigned short int to unsigned long int, for performance reasons.
If the code used ~0u here, it would have to be changed to ~ (unsigned
long int) 0, and it's quite possible that people would forget to make
that change. Whereas if we simply change it to -1 now, it will work
regardless of later changes like this.
I should mention that the situation is different in signed contexts.
In general one must use ~ (SIGNED_TYPE) 0 in that case to get an
all-1s pattern. But signed bit-twiddling is trickier (since one must
in general worry about ~0 == 0 and overflow issues), and I'd rather
that the regex code stuck with unsigned unsigned bit-twiddling.
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1278
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.