[PATCH v2] dwarf_getaranges: Build aranges list from CUs instead of .debug_aranges

Tue Feb 20 22:23:32 GMT 2024

Hi Aaron,

We already discussed on irc, but just for the record.

On Mon, Feb 19, 2024 at 11:20:13PM -0500, Aaron Merey wrote:
> On Tue, Feb 13, 2024 at 8:28 AM Mark Wielaard <mark@klomp.org> wrote:
> >
> > > This patch's method of building the aranges list is slower than simply
> > > reading .debug_aranges.  On my machine, running eu-stack on a 2.9G
> > > firefox core file takes about 8.7 seconds with this patch applied,
> > > compared to about 3.3 seconds without this patch.
> >
> > That is significant. 2.5 times slower.
> > Did you check with perf or some other profiler where exactly the extra
> > time goes. Does the new method find more aranges (and so produces
> > "better" backtraces)?
> 
> I took another look at the performance and realized I made a silly
> mistake when I originally tested this.  My build that was 2.5x slower
> was compiled with -O0 but I tested it against an -O2 build.  Oops!
> 
> With the optimization level set to -O2 in all cases, the runtime of
> 'eu-stack -s' on the original 2.9G firefox core file is only about
> 9% slower: 3.6 seconds with the patch applied compared to 3.3
> seconds without the patch.

OK, still a slowdown, but 9% is much more reasonable given we are
doing more work now. Good.

> As for the number of aranges found, there is a difference for libxul.so:
> 250435 with the patch compared to 254832 without.  So 4397 fewer aranges
> are found when using the new CU iteration method.  I'll dig into this and
> see if there is a problem or if it's just due to some redundancy in
> libxul's .debug_aranges.  FWIW there was no change to the aranges counts
> for the other modules searched during this eu-stack firefox corefile test.

A quick way to see where the differences are is using
eu-readelf --debug-dump=decodedaranges before/after your patch.

This is opposite to what I expected. I had expected there to be more,
instead of less ranges. The difference is less than 2%. But still
interesting to know what/why.

Were there any differences in the backtraces? If not, then those
ranges might not actually have been mapping to code.

> > Might it be an idea to leave dwarf_getaranges as it is and introduce a
> > new (internal) function to get "dynamic" ranges? It looks like what
> > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie
> > and dwfl_module_addrdie. These are currently build on dwarf_getaranges,
> > but could maybe use a new interface?
> 
> IMO this depends on what users expect from dwarf_getaranges.  Do they
> want the exact contents of .debug_aranges (whether or not it's complete)
> or should dwarf_getaranges go beyond .debug_aranges to ensure the most
> complete results?
> 
> The comment for dwarf_getaranges in libdw.h simply reads "Return list
> address ranges".  Since there's no mention of .debug_aranges specifically,
> I think it's fair if dwarf_getaranges does whatever it can to ensure
> comprehensive results.  In which case dwarf_getaranges should probably
> dynamically generate aranges.

You might be right that no user really cares. But as seen in the
eu-readelf code, it might also be that people expected it to map to
the ranges from .debug_aranges.

So I would be happier if we just kept the dwarf_getaranges code as
is. And just change the code in dwarf_addrdie and dwfl_module_addrdie.

We could then also introduce a new public function, dwarf_getdieranges
(?) that does the new thing. But it doesn't have to be public on the
first try as long as dwarf_addrdie and dwfl_module_addrdie work. (We
might want to change the interface of dwarf_getdieranges so it can be
"lazy" for example.)

Cheers,

Mark