This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] arm-tdep: sort mapping symbols after parsing all minimal symbols


On 2019-06-25 10:33 a.m., Alan Hayward wrote:
> 
> 
>> On 25 Jun 2019, at 02:37, Simon Marchi <simon.marchi@polymtl.ca> wrote:
>>
>> Somebody on IRC reported a while ago that loading a big ARM program in
>> GDB was very slow.  Their profiling pointed out that a big amount of
>> time was spent in
>>
>>   VEC_safe_insert (arm_mapping_symbol_s, *map_p, idx, &new_map_sym);
>>
>> I was able to verify this as well.
>>
>> ARM mapping symbols are special ELF symbols named $a, $d and $t
>> indicating that symbols starting at this address up to the next mapping
>> symbol (in terms of address) are of type "ARM code", "data" and "Thumb
>> code", respectively.  GDB records these symbols in vectors (one for each
>> section) in arm-tdep.c.  These vectors are sorted by symbol address, to
>> allow for quick lookup.  The current approach is to insert new symbols
>> at the right position to keep the vectors sorted at all time.  This is
>> done based on the assumption that mapping symbols come already almost
>> sorted from the binary, as explains this comment in
>> arm_record_special_symbol:
>>
>> /* Assume that most mapping symbols appear in order of increasing
>>    value.  If they were randomly distributed, it would be faster to
>>    always push here and then sort at first use.  */
> 
> I’ve been wondering where this assumption came from, whether there is any
> evidence to back it up, one way or the other.
> 
> Sadly, the original patch doesn’t give anymore clues (it was pushed without
> any obvious review):
> https://sourceware.org/ml/gdb-patches/2008-05/msg00113.html

Ah I hadn't thought of checking the original patch, thanks.

> I had a look at some binaries:
> *gdb itself and binaries built by the test suite - random order.
> *system binaries on Aarch32 Linux - no mapping symbols.
> 
> I then looked at a binary built with the Arm Compiler, and found that the
> symbols are in increasing numerical order.  That might explain where the
> assumption came from.

Ok, good to know.

>> I tried for fun on my Raspberry Pi 3, the run time of
>> elf_symtab_read goes from ~259s to ~9s, reading the same file.
> 
> That’s quite a significant change!

Yes, a bit too significant I would say!  I would appreciate if somebody else
could run a similar experiment to confirm that it's not just me who messed
something up.

> What would the effect be with your code if the symbol addresses were already
> sorted - would we expect a slowdown compared with the existing version?

If you can give me a binary of significant size that has symbols already sorted (so,
compiled with the ARM compiler), I could do the experiment.  I'd be curious to know.

My guess is that it won't be significantly slower.  I don't know what happens internally
when sorting an already sorted vector, but it seems like it's quite efficient:

https://stackoverflow.com/questions/6567326/does-stdsort-check-if-a-vector-is-already-sorted
https://lemire.me/blog/2016/09/28/sorting-already-sorted-arrays-is-much-faster/

> I’ve built this patch series and ran the test suite on an Aarch32 box, and
> saw no regressions.
> 
> As far as the code changes go, the patch below LGTM.
> 
> I looked at the previous path, I’m not an expert on std:vector, but the changes
> LGTM too.

Thanks, I'll push both patches shortly, with Tom's comments on the first patch
addressed.

Simon


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]