This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: RFA/symtab: Let search_symbols find exact matches
- From: Elena Zannoni <ezannoni at redhat dot com>
- To: David Carlton <carlton at math dot stanford dot edu>
- Cc: Daniel Jacobowitz <drow at mvista dot com>, gdb-patches at sources dot redhat dot com, Elena Zannoni <ezannoni at redhat dot com>, Jim Blandy <jimb at redhat dot com>
- Date: Fri, 21 Feb 2003 14:19:01 -0500
- Subject: Re: RFA/symtab: Let search_symbols find exact matches
- References: <20030210160107.GA587@nevyn.them.org><ro1k7fu4ltd.fsf@jackfruit.Stanford.EDU>
David Carlton writes:
> On Mon, 10 Feb 2003 11:01:07 -0500, Daniel Jacobowitz <drow at mvista dot com> said:
>
> > This patch renames search_symbols to search_symbols_aux, and lets it take an
> > argument that specified regex or exact (well, strcmp_iw) matching. Then
> > search_symbols becomes a wrapper.
>
> > The new function is used by make_symbol_overload_list, which shaves 50% time
> > off of some tests in the C++ directory if your system libc has debugging
> > symbols; it removes the bogusity where all psymtabs were converted to
> > symtabs during overload resolution. Whew. This cuts memory for
> > namespace.exp from 70MB to 7MB, allowing some of my hardware to actually run
> > the test without crashing.
>
> Here's take two on my comments on this. First, I should say that I
> totally agree that the aforementioned bogusity in
> make_symbol_overload_list should go; as discussed in
> <http://sources.redhat.com/ml/gdb-patches/2003-02/msg00261.html>,
> though, there are ways of doing that that involve much less drastic
> changes.
>
> So the question is: do search_symbols and make_symbol_overload_list
> have enough of a common core such that extracting that common core
> into a single function would clean up the code, reducing maintenance
> overhead, or would that common code be artificial, increasing
> maintenance overhead?
>
> Here's an analysis of the two functions. It may well be incorrect; I
> understand GDB's symbol mechanisms pretty well (better than I'd like,
> I sometimes think...), and I understand make_symbol_overload_list
> pretty well, but I haven't spent too much time with search_symbols
> yet.
>
<archeology>
BTW, make_symbol_overload_list came in via the HP merge. Roughly at
the same time (in parallel) the search_symbols function was
introduced. I.e. HP didn't have search_symbols available. Actually
the same functionality of search_symbol was provided by another
function called list_symbols. I suspect that list_symbols was broken
down into several functions, to be used in gdbtk, when search_symbols
was introduced.
</archeology>
So, if you change search_symbols we should make sure gdbtk still
works.
I agree it could be a good idea to clean up search_symbols, but keep
the history above in mind.
elena
>
> Loosely speaking, this is what search_symbols does:
>
> 1) If the regexp is non-NULL, it cleans it up a bit and compiles it.
>
> 2) It looks at every partial symtab; in each one, it looks at every
> partial symbol until it finds one that matches, at which point it
> reads in that partial symtab.
>
> 3) It (usually) looks at every single minimal symbol, to see if thre's
> a corresponding symbol.
>
> 4) It then looks at every single global and static symbol, to find the
> ones that match.
>
> 5) If there were minimal symbols without corresponding symbols, then
> it looks at every single minimal symbol again.
>
> Here, on the other hand, is what make_symbol_overload_list does.
>
> 1) It looks at every partial symtab, and reads them all in (in an
> amusingly verbose fashion). (My e-mail referenced above contains a
> patch fixing this.)
>
> 2) It looks at local variables, to find the ones that match.
>
> 3) It looks at every single global and static symbol, to find the ones
> that match.
>
>
> So: what's the same, what's different.
>
> Things that only search_symbols does:
>
> * It does some regexp-related stuff at the beginning.
>
> * It looks at every single minimal symbol at least once, possibly
> twice.
>
> Things that only make_symbol_overload_list does:
>
> * It looks at local variables.
>
> Things that both functions do:
>
> * Determine which psymtabs to read in.
>
> * Look at every single global and static symbol, to find the ones that
> match.
>
>
> So already I'm suspicious: those differences seem potentially
> significant to me. Some of them may well be due to accidents in the
> way the two functions were written (which would be exactly the sort of
> accidents that patches like Daniel's would help prevent!); but I think
> that, at the least, we need more analysis to see if that's the case.
> When doing this sort of refactoring, I think the proper order is to
> first minimize the differences and only then to combine them.
> (Shrinking the functions first will help, too: see below.)
>
> But even the similarities aren't as close as they seem. In both
> cases, make_symbol_overload_list can potentially use a much faster
> algorithm than search_symbols can, one that avoids looking at the vast
> majority of partial or regular symbols. For the psymtab case, it can
> just do one call to lookup_partial_symbol per psymtab, rather than
> look at (almost) every partial symbol. For the symbols, all the
> matching symbols will be inside the same hash bucket, so we could
> conceivably just look in one bucket per symtab (well, two, one for the
> globals and one for the statics), instead of looking at every symbol.
> (We don't currently have an appropriate iterator to do that; I do have
> one in my branch, that I plan to try to move to mainline in a few
> weeks.)
>
>
> So. I like the basic idea of combining the functions as a longer-term
> goal, but I'm pretty sure it's not a good idea right now. I think a
> better thing to do would be to get a good handle on the difference
> between search_symbol and make_symbol_overload_list. (And
> make_symbol_completion_list: it may well be easier to combine
> make_symbol_overload_list and make_symbol_completion_list as a first
> step.)
>
> One possible concrete recommendation as a step towards this
> refactoring would be to extract the body of search_symbol into 5
> separate functions along the lines of my analysis above, to extract
> the body of make_symbol_overload_list into 3 separate functions along
> the lines of my analysis above, and similarly with
> make_symbol_completion_list. (This is like what I did with
> lookup_symbol_aux.) That will enable us to do two things:
>
> * By looking at the resulting much shorter versions of those
> functions, it'll be much clearer what each one does in broad terms.
> We can then think about whether the differences between them are
> important or not.
>
> * When an extracted function from, say, make_symbol_overload_list is
> playing a similar role to an extracted function from, say,
> make_symbol_completion_list, then examine those functions to see if
> they can be profitably unified into a single function.
>
> That may sound like a lot of work, but having gone through it a few
> times, it's really not so bad. And it's a great way to understand and
> clean up functions. It doesn't interact well with GDB's approval
> process, though; I have some ideas about that, too, but I don't think
> this is the place for them.
>
> David Carlton
> carlton at math dot stanford dot edu