This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
[regex] Fix BZ429
- From: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
- To: libc-alpha at sources dot redhat dot com, jakub at redhat dot com
- Date: Wed, 10 Nov 2004 12:09:52 +0100
- Subject: [regex] Fix BZ429
To fix BZ429, it is enough to do some kind of caching and cut the number
of invocations of calc_dst_limits_pos. Its current implementation is
naive: Complexity is exponential in N because every epsilon closure
visits N backreferences: this without any hope of succeeding, because no
OP_{OPEN,CLOSE}_SUBEXP node is reachable from the OP_BACK_REF nodes.
The tests in bug-regex11.c now complete in a sane amount of time (11
seconds each on a G4 PowerMac), but still not exactly small so I've not
enabled them. Unluckily this does not speed up other testcases, but the
profiles for these pathologic regexes look more similar to the common
one (sift_states_bkref is at the top).
Paolo
2004-09-11 Paolo Bonzini <bonzini@gnu.org>
* posix/regex_internal.h (struct re_backref_cache_entry):
Add a map of reachable subexpression nodes from each backreference
cache entry.
* posix/regexec.c (check_dst_limits_calc_pos_1): Use the map
to cut recursive paths. Make exit condition more precise.
(match_ctx_add_entry): Initialize the map.
--- orig/lib/regex_internal.h
+++ mod/lib/regex_internal.h
@@ -547,9 +547,9 @@ struct re_backref_cache_entry
int str_idx;
int subexp_from;
int subexp_to;
- /* We need only one byte from the following field. If other small
- fields are added the type could be changed to 'char'. */
- int more;
+ char more;
+ char unused;
+ short eps_reachable_subexps_map;
};
typedef struct
--- orig/lib/regexec.c
+++ mod/lib/regexec.c
@@ -1884,7 +1884,12 @@ check_dst_limits_calc_pos_1 (mctx, bound
{
int dst, cpos;
- if (ent->node != node || ent->subexp_from != ent->subexp_to)
+ if (ent->node != node)
+ continue;
+
+ if (subexp_idx <= 8 * sizeof (ent->eps_reachable_subexps_map)
+ && (ent->eps_reachable_subexps_map
+ & (1 << (subexp_idx - 1))) == 0)
continue;
/* Recurse trying to reach the OP_OPEN_SUBEXP and
@@ -1905,11 +1910,13 @@ check_dst_limits_calc_pos_1 (mctx, bound
cpos = check_dst_limits_calc_pos_1 (mctx, boundaries,
subexp_idx, dst, bkref_idx);
- if (cpos == -1 && (boundaries & 1))
+ if (cpos == -1 /* && (boundaries & 1) */)
return -1;
- if (cpos == 0 /* && (boundaries & 2) */)
+ if (cpos == 0 && (boundaries & 2))
return 0;
+
+ ent->eps_reachable_subexps_map &= ~(1 << (subexp_idx - 1));
}
while (ent++->more);
break;
@@ -4169,6 +4176,18 @@ match_ctx_add_entry (mctx, node, str_idx
mctx->bkref_ents[mctx->nbkref_ents].str_idx = str_idx;
mctx->bkref_ents[mctx->nbkref_ents].subexp_from = from;
mctx->bkref_ents[mctx->nbkref_ents].subexp_to = to;
+
+ /* This is a cache that saves negative results of check_dst_limits_calc_pos.
+ If bit N is clear, means that this entry won't epsilon-transition to
+ an OP_OPEN_SUBEXP or OP_CLOSE_SUBEXP for the N+1-th subexpression. If
+ it is set, check_dst_limits_calc_pos_1 will recurse and try to find one
+ such node.
+
+ A backreference does not epsilon-transition unless it is empty, so set
+ to all zeros if FROM != TO. */
+ mctx->bkref_ents[mctx->nbkref_ents].eps_reachable_subexps_map
+ = (from == to ? ~0 : 0);
+
mctx->bkref_ents[mctx->nbkref_ents++].more = 0;
if (mctx->max_mb_elem_len < to - from)
mctx->max_mb_elem_len = to - from;