This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
- From: Keith Seitz <keiths at redhat dot com>
- To: Tom Tromey <tom at tromey dot com>
- Cc: gdb-patches at sourceware dot org
- Date: Tue, 18 Dec 2018 09:28:30 -0800
- Subject: Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
- References: <20181109022847.32049-1-keiths@redhat.com> <87lg4p4d92.fsf@tromey.com>
Hi, Tom,
On 12/16/18 9:01 AM, Tom Tromey wrote:
>
> I wonder a bit more about the memory overhead, though.
I see that I didn't include this information in my original submission.
I apologize for that because, of course, our own gdb.perf has this
information.
For completeness, I've copied the vmsize results that I collected during
my original testing.
Difference to unpatched master (at the time of submission -- multiply
by 100 for percentage), vmsize:
gmonster1:gmonster-null-lookup 10000-cus 0.00132458225987
gmonster1:gmonster-pervasive-typedef vmsize 10000-cus -0.000998352718015
gmonster1:gmonster-print-cerr 10000-cus -0.018685850806828
gmonster1:gmonster-ptype-string 10000-cus 0.005297872904029
gmonster1:gmonster-runto-main 10000-cus 0.018689093578962
gmonster1:gmonster-select-file 10000-cus 0.007689710164794
gmonster2:gmonster-null-lookup 1000-sos 0.009959564169472
gmonster2:gmonster-pervasive-typedef 1000-sos 0.000
gmonster2:gmonster-print-cerr 1000-sos 0.002044195506858
gmonster2:gmonster-ptype-string 1000-sos -0.021910168309929
gmonster2:gmonster-runto-main 1000-sos 0.006132085113341
gmonster2:gmonster-select-file 1000-sos 0.038964767646938
Using various methods to check memory consumption (massif, smem), the average
additional memory consumption this patch adds with firefox is approx 1.3%.
That doesn't seem altogether too out-of-line with gdb.perf results. We can
probably shave a bit off this by using a bitfield or char instead of an
unsigned short, too.
> Keith> +struct multidictionary
> Keith> +{
> Keith> + /* An array of dictionaries, one per language. All dictionaries
> Keith> + must be of the same type. This should be free'd for expandable
> Keith> + dictionary types. */
> Keith> + struct dictionary **dictionaries;
> Keith> +
> Keith> + /* The number of language dictionaries currently allocated.
> Keith> + Only used for expandable dictionaries. */
> Keith> + unsigned short n_allocated_dictionaries;
> Keith> +};
>
> I think this is the main possibly contentious bit of this series.
Is it just the increase of memory usage introduced by this struct that causes you
concern?
> I wondered when reading this series why it isn't possible to just put
> symbols of multiple languages into a single hash table. That would seem
> to make the series as a whole much simpler: no mass renaming, no need to
> store multiple dictionaries in a block, etc.
>
> Maybe one problem is that when searching you may only have the
> searched-for symbol name; so you'd have to maybe keep a bitset of all
> the languages in a dictionary and then do multiple hash lookups? But
> still that seems better.
Yeah, this is the main implementation problem, exemplified by
iter_match_first_hashed:
const language_defn *lang = DICT_LANGUAGE (dict);
unsigned int hash_index = (name.search_name_hash (lang->la_language)
% DICT_HASHED_NBUCKETS (dict));
symbol_name_matcher_ftype *matches_name
= get_symbol_name_matcher (lang, name);
/* Loop through the symbols in the given bucket, breaking when SYM
first matches. If SYM never matches, it will be set to NULL;
either way, we have the right return value. */
for (sym = DICT_HASHED_BUCKET (dict, hash_index);
sym != NULL;
sym = sym->hash_next)
{
/* Warning: the order of arguments to compare matters! */
if (matches_name (SYMBOL_SEARCH_NAME (sym), name, NULL))
break;
}
Some of this is fairly trivial to fixup to permit multiple languages in
a dictionary, e.g., moving the get_symbol_name_matcher into the loop. It
adds cost for the loop, but I have not attempted to quantify that.
The bigger problem is computing the hash_index for the bucket we want to
loop over... With no language definition associated with the dictionary,
we will have to loop over /all/ buckets, computing a hash for the first
symbol in the bucket until we find a matching bucket, if any. An alternative
is storing the hash in each bucket to facilitate this, but that would consume
even more memory.
This sounded incredibly wasteful to me when I wrote this patch.
Keith