This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/5514] New: memmem is O(n^2), but should be O(n)


The worst-case complexity of memmem is currently quadratic, O(m*n) where m is
the length of the haystack and n the length of the needle.  The worst case
occurs when the distinguishing feature of the needle is at the end, the haystack
consists of repetitions of the needle's prefix, and the needle is either not
present or occurs late in the haystack.  Since each iteration examines the bulk
of the needle, but only advances one byte per iteration, you have quadratic
performance.  Using the Knuth-Morris-Pratt algorithm or Boyer-Moore algorithm
would guarantee worst-case complexity of O(m+n), at the expense of a memory
allocation of O(n).

Gnulib recently converted its memmem implementation to use KMP, along with an
allocation amortization that avoids KMP until after a bounded number of failed
naive iterations, and falling back to the naive algorithm when KMP cannot
allocate enough memory.

http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=69d2edc;hb=fc068cf

It looks like strstr could also benefit from this algorithmic speedup.

-- 
           Summary: memmem is O(n^2), but should be O(n)
           Product: glibc
           Version: 2.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: ebb9 at byu dot net
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=5514

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]