This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[regex] Fix BZ429


To fix BZ429, it is enough to do some kind of caching and cut the number of invocations of calc_dst_limits_pos. Its current implementation is naive: Complexity is exponential in N because every epsilon closure visits N backreferences: this without any hope of succeeding, because no OP_{OPEN,CLOSE}_SUBEXP node is reachable from the OP_BACK_REF nodes.

The tests in bug-regex11.c now complete in a sane amount of time (11 seconds each on a G4 PowerMac), but still not exactly small so I've not enabled them. Unluckily this does not speed up other testcases, but the profiles for these pathologic regexes look more similar to the common one (sift_states_bkref is at the top).

Paolo
2004-09-11  Paolo Bonzini  <bonzini@gnu.org>

	* posix/regex_internal.h (struct re_backref_cache_entry):
	Add a map of reachable subexpression nodes from each backreference
	cache entry.
	* posix/regexec.c (check_dst_limits_calc_pos_1): Use the map
	to cut recursive paths.  Make exit condition more precise.
	(match_ctx_add_entry): Initialize the map.

--- orig/lib/regex_internal.h
+++ mod/lib/regex_internal.h
@@ -547,9 +547,9 @@ struct re_backref_cache_entry
   int str_idx;
   int subexp_from;
   int subexp_to;
-  /* We need only one byte from the following field.  If other small
-     fields are added the type could be changed to 'char'.  */
-  int more;
+  char more;
+  char unused;
+  short eps_reachable_subexps_map;
 };
 
 typedef struct


--- orig/lib/regexec.c
+++ mod/lib/regexec.c
@@ -1884,7 +1884,12 @@ check_dst_limits_calc_pos_1 (mctx, bound
 	      {
 		int dst, cpos;
 
-		if (ent->node != node || ent->subexp_from != ent->subexp_to)
+		if (ent->node != node)
+		  continue;
+
+		if (subexp_idx <= 8 * sizeof (ent->eps_reachable_subexps_map)
+		    && (ent->eps_reachable_subexps_map
+			& (1 << (subexp_idx - 1))) == 0)
 		  continue;
 
 		/* Recurse trying to reach the OP_OPEN_SUBEXP and
@@ -1905,11 +1910,13 @@ check_dst_limits_calc_pos_1 (mctx, bound
 		cpos = check_dst_limits_calc_pos_1 (mctx, boundaries,
 						    subexp_idx, dst, bkref_idx);
 
-		if (cpos == -1 && (boundaries & 1))
+		if (cpos == -1 /* && (boundaries & 1) */)
 		  return -1;
 
-		if (cpos == 0 /* && (boundaries & 2) */)
+		if (cpos == 0 && (boundaries & 2))
 		  return 0;
+
+		ent->eps_reachable_subexps_map &= ~(1 << (subexp_idx - 1));
 	      }
 	    while (ent++->more);
 	    break;
@@ -4169,6 +4176,18 @@ match_ctx_add_entry (mctx, node, str_idx
   mctx->bkref_ents[mctx->nbkref_ents].str_idx = str_idx;
   mctx->bkref_ents[mctx->nbkref_ents].subexp_from = from;
   mctx->bkref_ents[mctx->nbkref_ents].subexp_to = to;
+
+  /* This is a cache that saves negative results of check_dst_limits_calc_pos.
+     If bit N is clear, means that this entry won't epsilon-transition to
+     an OP_OPEN_SUBEXP or OP_CLOSE_SUBEXP for the N+1-th subexpression.  If
+     it is set, check_dst_limits_calc_pos_1 will recurse and try to find one
+     such node.
+
+     A backreference does not epsilon-transition unless it is empty, so set
+     to all zeros if FROM != TO.  */
+  mctx->bkref_ents[mctx->nbkref_ents].eps_reachable_subexps_map
+    = (from == to ? ~0 : 0);
+
   mctx->bkref_ents[mctx->nbkref_ents++].more = 0;
   if (mctx->max_mb_elem_len < to - from)
     mctx->max_mb_elem_len = to - from;




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]