This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's

From: Richard Biener <rguenther at suse dot de>
To: Keith Seitz <keiths at redhat dot com>
Cc: Tom Tromey <tom at tromey dot com>, gdb-patches at sourceware dot org
Date: Wed, 19 Dec 2018 12:42:21 +0100 (CET)
Subject: Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
References: <20181109022847.32049-1-keiths@redhat.com> <87lg4p4d92.fsf@tromey.com> <36e34a33-55bb-07aa-13f4-70b83f5026bf@redhat.com>

On Tue, 18 Dec 2018, Keith Seitz wrote:

> Hi, Tom,
> 
> On 12/16/18 9:01 AM, Tom Tromey wrote:
> > 
> > I wonder a bit more about the memory overhead, though.
> 
> I see that I didn't include this information in my original submission.
> I apologize for that because, of course, our own gdb.perf has this
> information.
> 
> For completeness, I've copied the vmsize results that I collected during
> my original testing.
> 
> Difference to unpatched master (at the time of submission -- multiply
> by 100 for percentage), vmsize:
> 
> gmonster1:gmonster-null-lookup	10000-cus 0.00132458225987
> gmonster1:gmonster-pervasive-typedef	vmsize	10000-cus -0.000998352718015
> gmonster1:gmonster-print-cerr	     10000-cus -0.018685850806828
> gmonster1:gmonster-ptype-string	     10000-cus  0.005297872904029
> gmonster1:gmonster-runto-main	     10000-cus  0.018689093578962
> gmonster1:gmonster-select-file	     10000-cus  0.007689710164794
> gmonster2:gmonster-null-lookup	     1000-sos   0.009959564169472
> gmonster2:gmonster-pervasive-typedef 1000-sos	0.000
> gmonster2:gmonster-print-cerr	     1000-sos   0.002044195506858
> gmonster2:gmonster-ptype-string	     1000-sos  -0.021910168309929
> gmonster2:gmonster-runto-main	     1000-sos   0.006132085113341
> gmonster2:gmonster-select-file	     1000-sos   0.038964767646938
> 
> Using various methods to check memory consumption (massif, smem), the average
> additional memory consumption this patch adds with firefox is approx 1.3%.
> That doesn't seem altogether too out-of-line with gdb.perf results. We can
> probably shave a bit off this by using a bitfield or char instead of an
> unsigned short, too.
> 
> > Keith> +struct multidictionary
> > Keith> +{
> > Keith> +  /* An array of dictionaries, one per language.  All dictionaries
> > Keith> +     must be of the same type.  This should be free'd for expandable
> > Keith> +     dictionary types.  */
> > Keith> +  struct dictionary **dictionaries;
> > Keith> +
> > Keith> +  /* The number of language dictionaries currently allocated.
> > Keith> +     Only used for expandable dictionaries.  */
> > Keith> +  unsigned short n_allocated_dictionaries;
> > Keith> +};
> > 
> > I think this is the main possibly contentious bit of this series.
> 
> Is it just the increase of memory usage introduced by this struct that causes you
> concern?
> 
> > I wondered when reading this series why it isn't possible to just put
> > symbols of multiple languages into a single hash table.  That would seem
> > to make the series as a whole much simpler: no mass renaming, no need to
> > store multiple dictionaries in a block, etc.
> > 
> > Maybe one problem is that when searching you may only have the
> > searched-for symbol name; so you'd have to maybe keep a bitset of all
> > the languages in a dictionary and then do multiple hash lookups?  But
> > still that seems better.
> 
> Yeah, this is the main implementation problem, exemplified by
> iter_match_first_hashed:
> 
>   const language_defn *lang = DICT_LANGUAGE (dict);
>   unsigned int hash_index = (name.search_name_hash (lang->la_language)
> 			     % DICT_HASHED_NBUCKETS (dict));
>   symbol_name_matcher_ftype *matches_name
>     = get_symbol_name_matcher (lang, name);
> 
>   /* Loop through the symbols in the given bucket, breaking when SYM
>      first matches.  If SYM never matches, it will be set to NULL;
>      either way, we have the right return value.  */
>   
>   for (sym = DICT_HASHED_BUCKET (dict, hash_index);
>        sym != NULL;
>        sym = sym->hash_next)
>     {
>       /* Warning: the order of arguments to compare matters!  */
>       if (matches_name (SYMBOL_SEARCH_NAME (sym), name, NULL))
> 	break;
>     }
> 
> Some of this is fairly trivial to fixup to permit multiple languages in
> a dictionary, e.g., moving the get_symbol_name_matcher into the loop. It
> adds cost for the loop, but I have not attempted to quantify that.
> 
> The bigger problem is computing the hash_index for the bucket we want to
> loop over... With no language definition associated with the dictionary,
> we will have to loop over /all/ buckets, computing a hash for the first
> symbol in the bucket until we find a matching bucket, if any. An alternative
> is storing the hash in each bucket to facilitate this, but that would consume
> even more memory.
> 
> This sounded incredibly wasteful to me when I wrote this patch.

I wonder whether we have a "context" language for 'name' that already
determines what hash function and what matcher we should use?  That is,
in gdb user expressions already have to be written in the language
currently active, so even for a "C" struct symbol x I have to
write x%f when in fortran mode and x.f doesn't work.

Note that during poking on the bug myself I figured that what language
a symbol ends up associated with is also inconsitent dependend on
who creates the symbol.

Richard.

> Keith
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

References:
- Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
  - From: Tom Tromey
- Re: [PATCH 1/5] gdb/23712: Introduce multidictionary's
  - From: Keith Seitz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]