This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 26/40] Optimize .gdb_index symbol name searching
- From: Keith Seitz <keiths at redhat dot com>
- To: Pedro Alves <palves at redhat dot com>, gdb-patches at sourceware dot org
- Date: Tue, 8 Aug 2017 13:31:52 -0700
- Subject: Re: [PATCH 26/40] Optimize .gdb_index symbol name searching
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=keiths at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 27FED4F93B
- References: <1496406158-12663-1-git-send-email-palves@redhat.com> <1496406158-12663-27-git-send-email-palves@redhat.com>
On 06/02/2017 05:22 AM, Pedro Alves wrote:
> I got, before the previous patch (-O2, x86-64):
>
> real 0m1.773s
> user 0m1.737s
> sys 0m0.040s
>
> and after this patch:
>
> real 0m1.361s
> user 0m1.315s
> sys 0m0.040s
The results on my computer are slightly more dramatic, running with no
optimization, your test case (using Fedora 21 system gdb w/index debuginfo)
goes from about 15 seconds down to about 2.5 seconds. Very nice!
> That resulted in 1351355 name_components. Each entry takes 8 bytes,
> so that's 10810840 bytes (ignoring std::vector overhead), or ~10.3 MB.
> That's IMO too small to worry about, given GDB was using over 7400MB
> total at that point. I.e., we're talking about 0.1% increase.
Indeed. I'd sacrifice that kind of memory for the kind of speed increase
you've achieved -- in a heartbeat!
> with only 8-bit and 32-bit tables, that'd be:
>
> 1349057 * 1 + 2298 * 4 + 4 * 1351355 = 6763669 bytes, or ~6.5MB.
>
> I don't think we need to bother though.
I'm all for memory usage optimization and whatnot, but since the benefit is
so small (55% of these new tables saved but only 0.06% of total memory),
I prefer simplicity. So you won't get anything but agreement from me on this.
> I also timed:
>
> $ time gdb --batch -q -p `pidof firefox`
> $ time gdb --batch -q -p `pidof firefox` -ex "b main"
> $ time gdb --batch -q -p `pidof firefox` -ex "set max-completion unlimited" -ex "complete b "
I'd like to reproduce this, but my computer is incapable of running this test.
I'll take your word for it. ;-)
> gdb/ChangeLog:
> yyyy-mm-dd Pedro Alves <palves@redhat.com>
>
> * dwarf2read.c
> (mapped_index::name_components): New field.
> (mapped_index::symbol_name_at): New method.
Silly nit: Isn't the form most are using "(tag name) <field>: New field."?
I know I've relied on this several times to find changes in the ChangeLog.
> (create_addrmap_from_index): Call mapped_index ctor.
I don't see any changes to this function in the patch -- attributed to wrong
function?
> diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
> index f523326..e955131 100644
> --- a/gdb/dwarf2read.c
> +++ b/gdb/dwarf2read.c
> @@ -178,6 +178,51 @@ DEF_VEC_I (offset_type);
[snip]
> +
> +/* An index into a (C++) symbol name component in a symbol name as
> + recorded in the mapped_index's symbol table. For each C++ symbol
> + in the symbol table, we record one entry for the start of each
> + component in the symbol in a table of name components, and then
> + sort the table, in order to be able to binary search symbol names,
> + ignoring leading namespaces, both completion and regular look up.
> + For example, for symbol "A::B::C", we'll have an entry that points
> + to "A::B::C", another that points to "B::C", and another for "C".
> + Note that function symbols in GDB index have no parameter
> + information, just the function/method names. You can convert a
> + name_component to a "const char *" using the
> + 'mapped_index::symbol_name_at(offset_type)' method. */
missing nl?
> +struct name_component
> +{
> + /* Offset in the symbol name where the component starts. Stored as
> + a (32-bit) offset instead of a pointer to save memory and improve
> + locality on 64-bit architectures. */
> + offset_type name_offset;
> +
> + /* The symbol's index in the symbol and constant pool tables of a
> + mapped_index. */
> + offset_type idx;
> +};
> +
> /* A description of the mapped index. The file format is described in
> a comment by the code that writes the index. */
> struct mapped_index
> @@ -3390,6 +3424,7 @@ dwarf2_read_index (struct objfile *objfile)
> create_addrmap_from_index (objfile, &local_map);
>
> map = XOBNEW (&objfile->objfile_obstack, struct mapped_index);
> + map = new (map) mapped_index ();
> *map = local_map;
>
> dwarf2_per_objfile->index_table = map;
This function (dwarf2_read_index) is not mentioned in the ChangeLog. Could
this actually be incorrectly listed in the ChangeLog under
create_addrmap_from_index?
> @@ -4095,6 +4134,22 @@ gdb_index_symbol_name_matcher::matches (const char *symbol_name)
> }
>
> static void
> +dw2_expand_marked_cus
> + (mapped_index &index, offset_type idx,
> + struct objfile *objfile,
> + gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
> + gdb::function_view<expand_symtabs_exp_notify_ftype> expansion_notify,
> + search_domain kind);
> +
> +static void
> +dw2_expand_symtabs_matching_symbol
> + (mapped_index &index,
> + const lookup_name_info &lookup_name_in,
> + gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
> + enum search_domain kind,
> + gdb::function_view<void (offset_type)> on_match);
Isn't it rather unusual for us to have forward decls in the middle of a file?
> +
> +static void
> dw2_expand_symtabs_matching
> (struct objfile *objfile,
> gdb::function_view<expand_symtabs_file_matcher_ftype> file_matcher,
> @@ -4186,30 +4239,214 @@ dw2_expand_symtabs_matching
[snip]
> +static void
> +dw2_expand_symtabs_matching_symbol
> + (mapped_index &index,
> + const lookup_name_info &lookup_name,
> + gdb::function_view<expand_symtabs_symbol_matcher_ftype> symbol_matcher,
> + enum search_domain kind,
> + gdb::function_view<void (offset_type)> match_callback)
> +{
[snip]
> +
> + /* Sort name_comp elements by name. */
I presume that "name_comp" is really "name_components"?
[snip]
> + std::vector<offset_type> matches;
> + matches.reserve (std::distance (lower, upper));
> +
> + for (;lower != upper; ++lower)
> + {
> + const char *qualified = index.symbol_name_at (lower->idx);
> +
> + if (!lookup_name_matcher.matches (qualified)
> + || (symbol_matcher != NULL && !symbol_matcher (qualified)))
> continue;
>
> - /* The name was matched, now expand corresponding CUs that were
> - marked. */
> - vec = (offset_type *) (index->constant_pool
> - + MAYBE_SWAP (index->symbol_table[idx + 1]));
> + matches.push_back (lower->idx);
> + }
> +
> + std::sort (matches.begin (), matches.end ());
> +
> + /* Finally call the callback, once per match. */
> + ULONGEST prev = -1;
> + for (offset_type idx : matches)
> + {
> + if (prev != idx)
> + {
> + match_callback (idx);
> + prev = idx;
> + }
> + }
I admit, I'm a little surprised by the number of steps involved here: push
every element in the range into a vector, sort, then de-dup & perform callback.
Had I implemented this, my initial attempt would have been to use a htab_up
and take the sorting time on insertion.
I can imagine that for very large ranges, your approach could outperform
an htab; do you expect these ranges to be that large? Have I overlooked
something? [I'm just curious. Not suggesting any changes need to be made.]
> +
> + /* Above we use a type wider than idx's for 'prev', since 0 and
> + (offset_type)-1 are both possible values. */
> + static_assert (sizeof (prev) > sizeof (offset_type), "");
> +}
> +
[snip]
Keith