This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
RFA/symtab: (Almost) always hash blocks when searching them

From: Daniel Jacobowitz <drow at mvista dot com>
To: gdb-patches at sources dot redhat dot com
Date: Sun, 9 Feb 2003 17:03:21 -0500
Subject: RFA/symtab: (Almost) always hash blocks when searching them
I'm working on modifying the symbol lookup functions to return multiple
symbols when there are multiple possible matches, and I didn't want to have
to modify all three kinds of binary search in lookup_block_symbol.  First I
tried fixing mdebugread.c to generate hashed blocks properly; it was too
messy, and I couldn't build an mdebug toolchain to test with [mips-ecoff was
my best guess, and it's been broken for months.  Part of it was my fault and
then GCC started segfaulting after I fixed that].

So instead, I added a new function to hash a block retroactively.  Then, in
lookup_block_symbol, where we would previously have done a binary search we
instead hash the block and do a hash table search.  Amortized cost is
somewhat lower, complexity cost is much lower.  I like it.

I also updated the comments; the bit about not matching demangled names was
out of date.  Symbols are hashed by their demangled name, if any.

Is this patch OK?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

2003-02-09  Daniel Jacobowitz  <drow@mvista.com>

	* Makefile.in (symtab.o): Add $(buildsym_h) and $(gdb_assert_h).
	* buildsym.c (hash_block): New function.
	* buildsym.h: Add prototype for hash_block.
	* symtab.c: Include "buildsym.h" and "gdb_assert.h".
	(lookup_block_symbol): Use hash_block.  Remove binary search.
	Update comments.

Index: Makefile.in
===================================================================
RCS file: /cvs/src/src/gdb/Makefile.in,v
retrieving revision 1.328
diff -u -p -r1.328 Makefile.in
--- Makefile.in	7 Feb 2003 05:33:44 -0000	1.328
+++ Makefile.in	9 Feb 2003 21:52:05 -0000
@@ -2216,7 +2216,7 @@ symtab.o: symtab.c $(defs_h) $(symtab_h)
 	$(gdbcmd_h) $(call_cmds_h) $(gdb_regex_h) $(expression_h) \
 	$(language_h) $(demangle_h) $(inferior_h) $(linespec_h) \
 	$(filenames_h) $(gdb_obstack_h) $(gdb_string_h) $(gdb_stat_h) \
-	$(cp_abi_h) $(source_h)
+	$(cp_abi_h) $(source_h) $(buildsym_h) $(gdb_assert_h)
 target.o: target.c $(defs_h) $(gdb_string_h) $(target_h) $(gdbcmd_h) \
 	$(symtab_h) $(inferior_h) $(bfd_h) $(symfile_h) $(objfiles_h) \
 	$(gdb_wait_h) $(dcache_h) $(regcache_h)
Index: buildsym.c
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.c,v
retrieving revision 1.27
diff -u -p -r1.27 buildsym.c
--- buildsym.c	14 Jan 2003 00:15:05 -0000	1.27
+++ buildsym.c	9 Feb 2003 21:52:05 -0000
@@ -203,6 +203,56 @@ free_pending_blocks (void)
   pending_blocks = NULL;
 }
 
+/* Convert a block BLOCK with unhashed symbols into a block with
+   hashed symbols.
+
+   All blocks coming out of the debug info readers should either be
+   function blocks (which contain a list of arguments in addition to
+   locals, so we don't hash them [yet] in order to preserve the order
+   of the arguments) or already hashed.  However, some older readers
+   don't use finish_block, so they don't create hashed blocks.  The only
+   offender as of 2003-02-09 is mdebugread.  */
+
+void
+hash_block (struct block *block)
+{
+  int i, nsyms;
+  struct symbol **syms;
+
+  gdb_assert (BLOCK_FUNCTION (block) == NULL);
+  gdb_assert (BLOCK_HASHTABLE (block) == 0);
+
+  nsyms = BLOCK_NSYMS (block);
+
+  /* Save the current list of symbols.  */
+  syms = xmalloc (nsyms * sizeof (struct symbol *));
+  memcpy (syms, block->sym, nsyms * sizeof (struct symbol *));
+  memset (block->sym, 0, nsyms * sizeof (struct symbol *));
+
+  BLOCK_HASHTABLE (block) = 1;
+
+  /* Just reuse the already-allocated symbol pointers as our hashtable.
+     Normally we'd use a smaller hash table than this, but reclaiming
+     the memory at this point is hard.  */
+  BLOCK_BUCKETS (block) = nsyms;
+
+  for (i = 0; i < nsyms; i++)
+    {
+      struct symbol *sym;
+      unsigned int hash_index;
+      const char *name = SYMBOL_DEMANGLED_NAME (syms[i]);
+      if (name == NULL)
+	name = SYMBOL_NAME (syms[i]);
+      hash_index = msymbol_hash_iw (name);
+      hash_index = hash_index % BLOCK_BUCKETS (block);
+      sym = BLOCK_BUCKET (block, hash_index);
+      BLOCK_BUCKET (block, hash_index) = syms[i];
+      syms[i]->hash_next = sym;
+    }
+
+  xfree (syms);
+}
+
 /* Take one of the lists of symbols and make a block from it.  Keep
    the order the symbols have in the list (reversed from the input
    file).  Put the block on the list of pending blocks.  */
Index: buildsym.h
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.h,v
retrieving revision 1.10
diff -u -p -r1.10 buildsym.h
--- buildsym.h	9 Jan 2003 18:30:32 -0000	1.10
+++ buildsym.h	9 Feb 2003 21:52:05 -0000
@@ -227,6 +227,8 @@ extern void add_symbol_to_list (struct s
 extern struct symbol *find_symbol_in_list (struct pending *list,
 					   char *name, int length);
 
+extern void hash_block (struct block *block);
+
 extern void finish_block (struct symbol *symbol,
 			  struct pending **listhead,
 			  struct pending_block *old_blocks,
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.87
diff -u -p -r1.87 symtab.c
--- symtab.c	4 Feb 2003 18:07:01 -0000	1.87
+++ symtab.c	9 Feb 2003 21:52:06 -0000
@@ -40,17 +40,17 @@
 #include "linespec.h"
 #include "source.h"
 #include "filenames.h"		/* for FILENAME_CMP */
+#include "cp-abi.h"
+#include "buildsym.h"
 
 #include "hashtab.h"
-
 #include "gdb_obstack.h"
-
 #include <sys/types.h>
 #include <fcntl.h>
 #include "gdb_string.h"
 #include "gdb_stat.h"
 #include <ctype.h>
-#include "cp-abi.h"
+#include "gdb_assert.h"
 
 /* Prototypes for local functions */
 
@@ -1572,30 +1572,23 @@ find_main_psymtab (void)
   return (NULL);
 }
 
-/* Search BLOCK for symbol NAME in NAMESPACE.
-
-   Note that if NAME is the demangled form of a C++ symbol, we will fail
-   to find a match during the binary search of the non-encoded names, but
-   for now we don't worry about the slight inefficiency of looking for
-   a match we'll never find, since it will go pretty quick.  Once the
-   binary search terminates, we drop through and do a straight linear
-   search on the symbols.  Each symbol which is marked as being a C++
-   symbol (language_cplus set) has both the encoded and non-encoded names
-   tested for a match.
-
-   If MANGLED_NAME is non-NULL, verify that any symbol we find has this
-   particular mangled name.
-*/
+/* Search BLOCK for symbol NAME in NAMESPACE.  If MANGLED_NAME is non-NULL,
+   verify that any symbol we find has this particular mangled name.  */
 
 struct symbol *
 lookup_block_symbol (register const struct block *block, const char *name,
 		     const char *mangled_name,
 		     const namespace_enum namespace)
 {
-  register int bot, top, inc;
-  register struct symbol *sym;
-  register struct symbol *sym_found = NULL;
-  register int do_linear_search = 1;
+  int bot, top;
+  struct symbol *sym;
+  struct symbol *sym_found = NULL;
+
+  if (BLOCK_SHOULD_SORT (block))
+    {
+      hash_block (block);
+      gdb_assert (BLOCK_HASHTABLE (block));
+    }
 
   if (BLOCK_HASHTABLE (block))
     {
@@ -1613,88 +1606,8 @@ lookup_block_symbol (register const stru
       return NULL;
     }
 
-  /* If the blocks's symbols were sorted, start with a binary search.  */
-
-  if (BLOCK_SHOULD_SORT (block))
-    {
-      /* Reset the linear search flag so if the binary search fails, we
-         won't do the linear search once unless we find some reason to
-         do so */
-
-      do_linear_search = 0;
-      top = BLOCK_NSYMS (block);
-      bot = 0;
-
-      /* Advance BOT to not far before the first symbol whose name is NAME. */
-
-      while (1)
-	{
-	  inc = (top - bot + 1);
-	  /* No need to keep binary searching for the last few bits worth.  */
-	  if (inc < 4)
-	    {
-	      break;
-	    }
-	  inc = (inc >> 1) + bot;
-	  sym = BLOCK_SYM (block, inc);
-	  if (!do_linear_search && (SYMBOL_LANGUAGE (sym) == language_java))
-	    {
-	      do_linear_search = 1;
-	    }
-	  if (SYMBOL_SOURCE_NAME (sym)[0] < name[0])
-	    {
-	      bot = inc;
-	    }
-	  else if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
-	    {
-	      top = inc;
-	    }
-	  else if (strcmp (SYMBOL_SOURCE_NAME (sym), name) < 0)
-	    {
-	      bot = inc;
-	    }
-	  else
-	    {
-	      top = inc;
-	    }
-	}
-
-      /* Now scan forward until we run out of symbols, find one whose
-         name is greater than NAME, or find one we want.  If there is
-         more than one symbol with the right name and namespace, we
-         return the first one; I believe it is now impossible for us
-         to encounter two symbols with the same name and namespace
-         here, because blocks containing argument symbols are no
-         longer sorted.  The exception is for C++, where multiple functions
-	 (cloned constructors / destructors, in particular) can have
-	 the same demangled name.  So if we have a particular
-	 mangled name to match, try to do so.  */
-
-      top = BLOCK_NSYMS (block);
-      while (bot < top)
-	{
-	  sym = BLOCK_SYM (block, bot);
-	  if (SYMBOL_NAMESPACE (sym) == namespace
-	      && (mangled_name
-		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
-		  : SYMBOL_MATCHES_NAME (sym, name)))
-	    {
-	      return sym;
-	    }
-          if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
-            {
-              break;
-            }
-	  bot++;
-	}
-    }
-
-  /* Here if block isn't sorted, or we fail to find a match during the
-     binary search above.  If during the binary search above, we find a
-     symbol which is a Java symbol, then we have re-enabled the linear
-     search flag which was reset when starting the binary search.
-
-     This loop is equivalent to the loop above, but hacked greatly for speed.
+  /* If a block isn't hashable, we do a linear search here.  Right now
+     this means only for blocks associated with a function.
 
      Note that parameter symbols do not always show up last in the
      list; this loop makes sure to take anything else other than
@@ -1702,55 +1615,53 @@ lookup_block_symbol (register const stru
      last resort.  Note that this only takes up extra computation
      time on a match.  */
 
-  if (do_linear_search)
-    {
-      top = BLOCK_NSYMS (block);
-      bot = 0;
-      while (bot < top)
+  top = BLOCK_NSYMS (block);
+  bot = 0;
+  while (bot < top)
+    {
+      sym = BLOCK_SYM (block, bot);
+      if (SYMBOL_NAMESPACE (sym) == namespace
+	  && (mangled_name
+	      ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
+	      : SYMBOL_MATCHES_NAME (sym, name)))
 	{
-	  sym = BLOCK_SYM (block, bot);
-	  if (SYMBOL_NAMESPACE (sym) == namespace
-	      && (mangled_name
-		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
-		  : SYMBOL_MATCHES_NAME (sym, name)))
+	  /* If SYM has aliases, then use any alias that is active
+	     at the current PC.  If no alias is active at the current
+	     PC, then use the main symbol.
+
+	     ?!? Is checking the current pc correct?  Is this routine
+	     ever called to look up a symbol from another context?
+
+	     FIXME: No, it's not correct.  If someone sets a
+	     conditional breakpoint at an address, then the
+	     breakpoint's `struct expression' should refer to the
+	     `struct symbol' appropriate for the breakpoint's
+	     address, which may not be the PC.
+
+	     Even if it were never called from another context,
+	     it's totally bizarre for lookup_symbol's behavior to
+	     depend on the value of the inferior's current PC.  We
+	     should pass in the appropriate PC as well as the
+	     block.  The interface to lookup_symbol should change
+	     to require the caller to provide a PC.  */
+
+	  if (SYMBOL_ALIASES (sym))
+	    sym = find_active_alias (sym, read_pc ());
+
+	  sym_found = sym;
+	  if (SYMBOL_CLASS (sym) != LOC_ARG &&
+	      SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
+	      SYMBOL_CLASS (sym) != LOC_REF_ARG &&
+	      SYMBOL_CLASS (sym) != LOC_REGPARM &&
+	      SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
+	      SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
 	    {
-	      /* If SYM has aliases, then use any alias that is active
-	         at the current PC.  If no alias is active at the current
-	         PC, then use the main symbol.
-
-	         ?!? Is checking the current pc correct?  Is this routine
-	         ever called to look up a symbol from another context?
-
-		 FIXME: No, it's not correct.  If someone sets a
-		 conditional breakpoint at an address, then the
-		 breakpoint's `struct expression' should refer to the
-		 `struct symbol' appropriate for the breakpoint's
-		 address, which may not be the PC.
-
-		 Even if it were never called from another context,
-		 it's totally bizarre for lookup_symbol's behavior to
-		 depend on the value of the inferior's current PC.  We
-		 should pass in the appropriate PC as well as the
-		 block.  The interface to lookup_symbol should change
-		 to require the caller to provide a PC.  */
-
-	      if (SYMBOL_ALIASES (sym))
-		sym = find_active_alias (sym, read_pc ());
-
-	      sym_found = sym;
-	      if (SYMBOL_CLASS (sym) != LOC_ARG &&
-		  SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
-		  SYMBOL_CLASS (sym) != LOC_REF_ARG &&
-		  SYMBOL_CLASS (sym) != LOC_REGPARM &&
-		  SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
-		  SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
-		{
-		  break;
-		}
+	      break;
 	    }
-	  bot++;
 	}
+      bot++;
     }
+
   return (sym_found);		/* Will be NULL if not found. */
 }
Follow-Ups:
- Re: RFA/symtab: (Almost) always hash blocks when searching them
  - From: Daniel Jacobowitz
- Re: RFA/symtab: (Almost) always hash blocks when searching them
  - From: David Carlton
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]