This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Lazy CU expansion (Was: Will therefore GDB utilize C++ or not?)

From: Tom Tromey <tromey at redhat dot com>
To: John Gilmore <gnu at toad dot com>
Cc: Jan Kratochvil <jan dot kratochvil at redhat dot com>, Pedro Alves <palves at redhat dot com>, gdb at sourceware dot org
Date: Fri, 18 May 2012 12:51:29 -0600
Subject: Lazy CU expansion (Was: Will therefore GDB utilize C++ or not?)
References: <20120330161403.GA17891@host2.jankratochvil.net> <87aa2rjkb8.fsf@fleche.redhat.com> <4F832D5B.9030308@redhat.com> <20120409190519.GA524@host2.jankratochvil.net> <4F833D29.4050102@redhat.com> <20120416065456.GA30097@host2.jankratochvil.net> <4F8ECB72.70708@redhat.com> <20120418151553.GA16768@host2.jankratochvil.net> <4F8EDD7B.2010602@redhat.com> <20120418155354.GA17912@host2.jankratochvil.net> <201204181748.q3IHm1cF002815@new.toad.com> <87pqb4q2on.fsf@fleche.redhat.com> <201204182309.q3IN9FcF019607@new.toad.com>

>>>>> "John" == John Gilmore <gnu@toad.com> writes:

Some comments on your comments about lazy CU expansion.

John> The whole design of partial_symbols was that they're only needed when
John> the real symbols haven't been read in.  This is well documented.  In
John> fact the partial_symtab for a file can be (or used to be able to be)
John> thrown away when the real symtab is created, and many symbol-readers
John> never bothered to create partial_symbols.

I don't think it's been possible to discard a partial symtab for many
years now.

It doesn't seem very worthwhile to do it, given that the bulk of the
memory is in the psymbols themselves, and these can't ever be deleted.
But, maybe it would be worth trying.

John> Partial symtabs were only a
John> speed optimization to avoid parsing Stabs debugging info when host
John> machines ran at 20 megahertz.  You could probably get rid of them
John> entirely nowadays.

This seems unlikely to me; but due to memory use, not CPU.

Partial symbols take a lot less memory than full symbols, partly because
they are smaller, but more importantly because they can be put into the
bcache, and this is quite effective in practice.

John> The GDB Internals manual (which I originated when I discovered that
John> there was no internals documentation) makes it clear that there are
John> only a few ways to look up a symbol.  Has that nice clean bit of
John> modular C programming has been retained over the last decade?

No, the symbol tables are a total mess, and the internals manual is out
of date.

John> So how is this idea of pointing to psymbols going to save any
John> memory?

'struct symbol' starts with a 'general_symbol_info', and also includes
'domain' and 'aclass' fields -- all of which are duplicated in the
partial symbol.

So, pointing to the partial symbol will save at least
sizeof(general_symbol_info) - sizeof(void*) bytes per symbol.  On x86-64
that is 32 bytes.  Maybe it could save more memory with more packing.

More importantly, this sort of thing would allow instantiation of a full
symtab without re-parsing the DWARF.  Re-parsing is slow, and also
mostly pointless, as most symbols in a given CU are never used.

John> And if you're going to have to allocate all the memory for the
John> struct symbol, then why not populate it with the real information
John> for the symbol, instead of just a psymbol pointer?

Reading the remaining information is slow and uses memory, but the
results are often not used.  So it would be preferable to fill in the
details on demand.

Just skipping function bodies alone saves ~30% of the CU expansion time.

John> It's much simpler to read all the symbols in a symbol file, in
John> order, and once you're doing that anyway, you might as well save
John> them all.

Yes, it is simpler.  This is what is done now.  I think it doesn't scale
very well... Jan has dug up some C++ libraries where there is one
enormous CU which sucks up a lot of time if you happen to have to expand
it.

Tom> Full symbols are already reasonably C++y, what with the ops vector.

John> It looks to me like the "ops vector" in symbols in gdb-7.4 is pretty
John> minimal, only applying to a tiny number of symbol categories (and the
John> comments in findvar.c -- from 2004 -- report that DWARF2 symbols screw
John> up the ops vector anyway).  Large parts of GDB touch symbols; is the
John> idea that all of these will be rewritten to indirect through an
John> ops-table (either explicitly in C, or implicitly in C++) without ever
John> accessing fields (like SYMBOL_CLASS(sym)) directly?  Do you think this
John> will make GDB faster and smaller?  I don't.

I doubt it would be smaller.  History indicates this is of zero
importance.

It would probably be faster.  At least for lazy CU expansion, the
changes are of the form:

#define SYMBOL_TYPE(sym) \
  ((sym)->type ? (sym)->type : compute_symbol_type (sym))

... or moral equivalent.

Rewriting is not necessary, you can redefine the macros.
But, rewriting the uses would be better if we were moving to C++.
This is easy though.

John> (There's a comment in symtab.c from 2003 that says address classes and
John> ops vectors should be merged.  But clearly nobody has felt like doing
John> that work in the last 9 years -- probably because so many places in the
John> code would need to be touched.

I'm not sure I trust that comment.  I find that in general, comments in
GDB relating to future maintenance issues are often questionable.

Tom

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]