This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: struct environment


Daniel Jacobowitz <drow@mvista.com> writes:
> On Fri, Sep 06, 2002 at 10:20:38AM -0700, David Carlton wrote:

>> instead of requiring code to calculate the size of environments
>> before building them, we might as well support environments that
>> can be built incrementally and then finalized.  That way the common
>> bookkeeping code gets moved into the environment stuff, so people
>> don't have to keep on redoing it.

> The question is whether those partial environments should be treated
> as environments as all - do we even need to do lookups on them?  I
> believe we never should.

Fair enough: partial environments should be a separate concept, and
one which might not even be necessary (see below).

>> And there's another, even more important reason: the global
>> environment (and, in the future, namespaces).  The global
>> environment spans multiple files, uses lazy evaluation
>> (i.e. partial_symbols), and there's even minimal_symbols to
>> shoehorn in there somehow.  So it's going to need its own special
>> implementation (which will be much more complicated than the
>> implementations for blocks).

> This is why I don't like the environment == list-of-symbols thing.
> An environment may HAVE a list of symbols, but it is not its list of
> symbols.  You shouldn't grow the list of symbols in the global
> environment when a new file is added.  Instead you should associate
> a new list of symbols with it.  Files can be removed as well as
> added, remember.  We don't do that very well right now.

> Search the archives of this list for:
>   Date: 11 Jun 2001 12:55:42 -0400
>   From: Daniel Berlin <dan@cgsoftware.com>
>   Subject: [RFC] Symbol table improvements

> for some other suggested reading on this topic.  I like this
> architecture he describes, even if we're not quite ready for it yet.
> We could be.

Thanks for the pointer.  Certainly I'm not at all thrilled with the
way that the global symbol lookup functions work currently.  One
thought that I had was to use a growable but quickly-searchable data
structure for global lookups; but that has the problems that it makes
removing individual files complicated (incidentally, I had been
wondering if that could happen, but I guess from your response above
that the answer is 'yes') and it might make psymtab->symtab conversion
a bit trickier.

Skimming the message you're referring to, it seems like Daniel Berlin
is proposing keeping the idea that the global environment is made out
of a bunch of blocks from different files, but speeding up the process
of going from a symbol name to the specific blocks to search.  That
sounds like a good idea: it's still fast, but it still allows a
per-file granularity which seems like it would be useful.

>> At some point in the future, it might be possible to use a single
>> implementation for everything; it would have to support:

>> * Growing as needed
>> * Fast lookups
>> * Being able to retrieve all entries in the order in which they were
>> added
>> * Whatever extra cruft is necessary to support environments like the
>> global environment.

> I don't see the benefit of maintaining order-added information in the
> general case; we want it only for specific small lists.

I think I agree.  For now, I'd prefer to have a choice between
fixed-size hash tables or fixed-size small lists, rather than a single
general structure: having multiple internal implementations is easy
enough to do, and there's no need to pay the extra costs of the
general structure if it's not necessary.

>> I'll look at how the pending lists work; if all the code uses the
>> same data structure, then of course there should be a constructor
>> that takes the entire pending list rather than requiring each
>> symbol to be added individually.

> Look at buildsym.c:add_symbol_to_list, find_symbol_in_list, and
> finish_block.
...
> It already is centralized in buildsym, for all but the crufty readers.
...
> We've pretty much got this already.  That's finish_block; you just
> described its purpose.

Great.  So, modulo jv-lang.c and mdebugread.c, there's no need for
incremental construction of environments.  The only question, then, is
whether it would be easier to convert those special cases to
buildsym's mechanisms, or to handle them specially in the environment
code; if both are equally easy, then obviously the former is
preferable.

>> I think that I'm tentatively planning to defer issues like mangled
>> vs. demangled names until I convert over the global environment:
>> that will be the time to think about exactly what sorts of deep
>> lookup functions will be necessary, for example.  Having said that,
>> I just looked at lookup_block_symbol more closely, and I'm not sure
>> that my planned iterators would be quite enough to handle it.  (But
>> I might be able to delete the relevant code, see below.)

> The case I'm talking about is tied to some of our C++ evilness;
> there are multiple global symbols with the same demangled name, and
> it is vital that we know how to breakpoint the correct one, or the
> breakpoint will not be hit.

> That said this needs cleanups elsewhere.  We might be able to handle
> it some other way...

Hmm.  Clearly I'll need to examine this more closely.

>> Right, that's what I was thinking of: this could replace that easily
>> enough as long as there weren't two copies of ALL_BLOCK_SYMBOLS
>> running on the same block simultaneously, which seems plausible to
>> me.

> Probably.  And you could add assertions to ensure this.

I'm not sure that we can use assertions to help here.  The issue, as I
see it, is:

* If you're willing to make sure that you tell the iterator when
  you're done with it, then you might as well make it possible to keep
  around multiple iterators: that's more flexible.

* If you're not willing to make sure that you tell the iterator when
  you're done with it, then it's easiest if you one have iterator
  active on any given environment at any one time, because otherwise
  you have to take care to avoid memory leaks.

It seems to me that we're in the latter situation: currently, GDB has
code that does an ALL_BLOCK_SYMBOLS and that, once it's found the
symbol that it's looking for, breaks out of the ALL_BLOCK_SYMBOLS by
returning from the function.  So we'd have to modify those situations
to discard the iterator before returning, and it would be easy for
future users of ALL_BLOCK_SYMBOLS to forget that.

But, if we're in the latter situation, I don't see how to get
assertions to work.

At some point, I'll give it a look: ALL_BLOCK_SYMBOLS is only used in
29 places, so it should be tractable to survey them by hand.

Actually, now that I think about it, there's another possibility that
would avoid memory leaks: if we can make sure that all memory used by
the iterator gets put on the stack then we're set.  We could do this
with code like

  struct env_iterator it;

  for (env_iterator_initialize (env, &it);
       env_iterator_not_done (&it);
       env_iterator_advance (&it)) {
    do_something_with env_iterator_current_symbol (&it);
  }

This works fine even if the body of the for loop exists in an
unexpected manner.  And that idiom could be encapsulated with a macro
like ALL_BLOCK_SYMBOLS, of course.  I'm not thrilled with this idea,
but it's certainly workable.

David Carlton
carlton@math.stanford.edu


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]