This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's

From: Keith Seitz <keiths at redhat dot com>
To: Tom Tromey <tom at tromey dot com>
Cc: gdb-patches at sourceware dot org
Date: Tue, 18 Dec 2018 09:28:30 -0800
Subject: Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
References: <20181109022847.32049-1-keiths@redhat.com> <87lg4p4d92.fsf@tromey.com>

Hi, Tom,

On 12/16/18 9:01 AM, Tom Tromey wrote:
> 
> I wonder a bit more about the memory overhead, though.

I see that I didn't include this information in my original submission.
I apologize for that because, of course, our own gdb.perf has this
information.

For completeness, I've copied the vmsize results that I collected during
my original testing.

Difference to unpatched master (at the time of submission -- multiply
by 100 for percentage), vmsize:

gmonster1:gmonster-null-lookup	10000-cus 0.00132458225987
gmonster1:gmonster-pervasive-typedef	vmsize	10000-cus -0.000998352718015
gmonster1:gmonster-print-cerr	     10000-cus -0.018685850806828
gmonster1:gmonster-ptype-string	     10000-cus  0.005297872904029
gmonster1:gmonster-runto-main	     10000-cus  0.018689093578962
gmonster1:gmonster-select-file	     10000-cus  0.007689710164794
gmonster2:gmonster-null-lookup	     1000-sos   0.009959564169472
gmonster2:gmonster-pervasive-typedef 1000-sos	0.000
gmonster2:gmonster-print-cerr	     1000-sos   0.002044195506858
gmonster2:gmonster-ptype-string	     1000-sos  -0.021910168309929
gmonster2:gmonster-runto-main	     1000-sos   0.006132085113341
gmonster2:gmonster-select-file	     1000-sos   0.038964767646938

Using various methods to check memory consumption (massif, smem), the average
additional memory consumption this patch adds with firefox is approx 1.3%.
That doesn't seem altogether too out-of-line with gdb.perf results. We can
probably shave a bit off this by using a bitfield or char instead of an
unsigned short, too.

> Keith> +struct multidictionary
> Keith> +{
> Keith> +  /* An array of dictionaries, one per language.  All dictionaries
> Keith> +     must be of the same type.  This should be free'd for expandable
> Keith> +     dictionary types.  */
> Keith> +  struct dictionary **dictionaries;
> Keith> +
> Keith> +  /* The number of language dictionaries currently allocated.
> Keith> +     Only used for expandable dictionaries.  */
> Keith> +  unsigned short n_allocated_dictionaries;
> Keith> +};
> 
> I think this is the main possibly contentious bit of this series.

Is it just the increase of memory usage introduced by this struct that causes you
concern?

> I wondered when reading this series why it isn't possible to just put
> symbols of multiple languages into a single hash table.  That would seem
> to make the series as a whole much simpler: no mass renaming, no need to
> store multiple dictionaries in a block, etc.
> 
> Maybe one problem is that when searching you may only have the
> searched-for symbol name; so you'd have to maybe keep a bitset of all
> the languages in a dictionary and then do multiple hash lookups?  But
> still that seems better.

Yeah, this is the main implementation problem, exemplified by
iter_match_first_hashed:

  const language_defn *lang = DICT_LANGUAGE (dict);
  unsigned int hash_index = (name.search_name_hash (lang->la_language)
			     % DICT_HASHED_NBUCKETS (dict));
  symbol_name_matcher_ftype *matches_name
    = get_symbol_name_matcher (lang, name);

  /* Loop through the symbols in the given bucket, breaking when SYM
     first matches.  If SYM never matches, it will be set to NULL;
     either way, we have the right return value.  */

  for (sym = DICT_HASHED_BUCKET (dict, hash_index);
       sym != NULL;
       sym = sym->hash_next)
    {
      /* Warning: the order of arguments to compare matters!  */
      if (matches_name (SYMBOL_SEARCH_NAME (sym), name, NULL))
	break;
    }

Some of this is fairly trivial to fixup to permit multiple languages in
a dictionary, e.g., moving the get_symbol_name_matcher into the loop. It
adds cost for the loop, but I have not attempted to quantify that.

The bigger problem is computing the hash_index for the bucket we want to
loop over... With no language definition associated with the dictionary,
we will have to loop over /all/ buckets, computing a hash for the first
symbol in the bucket until we find a matching bucket, if any. An alternative
is storing the hash in each bucket to facilitate this, but that would consume
even more memory.

This sounded incredibly wasteful to me when I wrote this patch.

Keith

Follow-Ups:
- Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
  - From: Richard Biener

References:
- Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]