This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [RFA] bug in symtab.c:lookup_block_symbol()'s search method


Jason Molenda <jason-swarelist@molenda.com> writes:

> On Mon, Sep 10, 2001 at 02:50:29PM -0400, Daniel Berlin wrote:
>
>> Your test above isn't quite correct, actually.
>> There is a corner case of the first character being a space. I don't
>> think this can ever occur in c or C++, no clue about other languages.
>
> The initial binary search in lookup_block_symbol() isn't correct
> for the same reason - it uses individual character comparisons to
> determine less-than/greater-than relationships on the first character
> of the symbol, and then uses the stock strcmp to narrow in on the
> rest of the symbol.  Nevertheless, it all generally works.
Yet I remember just a few messages ago someone complaining that my
patches don't handle the corner cases, which strangely, aren't handled
anyway.
It's like speed was being put ahead of correctness.
> The binary search gets us to the beginning of the valid matches; the
> follow-up linear search finds the right symbol.
>
> My instinct is to change strcmp_iw() to streq_iw() (as well as all
> the callers) and add a new strcmp_iw() function which returns a
> less-than/greater-than relationship instead of just match/not-match.
> That would be the best way to deal with all of this.
 The best way to deal with this is to get rid of most of it.
>
>
> IMNSHO gdb 5.1 can not be released with the symbol binary search
> lookup broken as it has been for the last year.
Broken?
You mean slower.
It works *correctly*, just not as *efficiently* as it could.
Big difference.
>   The patch I sent
> is a conservative one which brings the lookup cost near the original
> gdb search time. 

Errr, realize I did *extensive* performance testing on that patch
before submitting it.
On files ranging from 1's to 100's of megs of debug info.
The lookup cost in general was a fraction of what it used to be, even with the
bug.
In no case was the lookup time greater in any programs i profiled gdb
on.

> the behavior of strcmp_iw() so that it can be used here, but that's
> a slightly more pervasive change.
As I pointed out to you in private email, a better solution is to
avoid the basic need to always binary search altogether. It's quite
easy to implement, taking somewhere around 5 hours to throw together
something that worked, and passed all the regression tests.

Assign each block a unique id (easy, since blocks are only created
once each).
Create a mapping of symbol names to block id's they appear in (easy,
since we can just add the symbol names to the map as we see them while
creating the blocks containing them.), storing the block id's for a
particular symbol in a sparse bitmap.
For symbol lookups:
Lookup the name in the mapping once at the beginning of the symbol
lookup.
Where we call lookup_block_symbol now, just test to see if the id'd
bit is set in the sparse bitmap for the symbol.
If so, call lookup block symbol.
Otherwise, continue on.

This moves us from O(blocks log n) worst case to O(blocks) worst case
(where n is the number of symbols in a given block)

When you have blocks with 100k symbols, it makes a large difference (a
factor of 16, in fact), since we'll never look in blocks which won't
contain the symbol.

We already have minsym_hash_iw, which gives you the hash function to
use for the mapping.

I went down the "make lookup_block_symbol itself faster" road myself,
fixing not only that bug, but replacing the linear array in the blocks
with a hashtable.
It's still slower than just avoiding the block lookup in the first place.
>
> Jason

-- 
"I have an answering machine in my car.  It says, "I'm home now.
But leave a message and I'll call when I'm out."
"-Steven Wright


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]