This is the mail archive of the
archer@sourceware.org
mailing list for the Archer project.
Re: Cross-CU C++ DIE references vs. mangling
- From: Sami Wagiaalla <swagiaal at redhat dot com>
- To: Roland McGrath <roland at redhat dot com>
- Cc: Jan Kratochvil <jan dot kratochvil at redhat dot com>, archer at sourceware dot org, Keith Seitz <keiths at redhat dot com>
- Date: Mon, 12 Apr 2010 14:46:48 -0400
- Subject: Re: Cross-CU C++ DIE references vs. mangling
- References: <20100310191833.GA2816@host0.dyn.jankratochvil.net> <20100310193207.GA6147@host0.dyn.jankratochvil.net> <20100311060305.B177A7D5E@magilla.sf.frob.com>
So after a few (really, many) reads of this email I think I can
summarize the issues and solutions discussed there. I just wanted to
make sure I have a proper understanding of the issue before filing a gcc
feature request. So, Is this a correct summary:
The goal is the help gdb find the proper location for variables where
declarations and definitions are separated over CU's or so's.
Why cant gdb do this by itself ? Because:
- It requires a search of all other CU's/so' to locate the definition.
Which is inefficient but also inaccurate since
- The scope of the declaration can be different from that of the
definition (e.g. class members). If DW_AT_MIPS_linkage_name is
available it can be used to resolve this, however
- if the definition is in a stripped DSO there is indeed a definition
(ELF) but nowhere is there a DW_AT_location pointing to it. Also,
- it is possible to have two names defined in two separate so's with the
same linkage name. eg:
> Consider:
>
> $ g++ -g -c -fPIC -o foo1.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { int i; };')
> $ g++ -g -c -fPIC -o foo2.o -xc++ <(echo 'namespace internal __attribute__((visibility("hidden"))) { extern int i; }; int foo () { return internal::i; }')
> $ gcc -g -shared -o foo.so foo1.o foo2.o
> $ g++ -g -c -fPIC -o bar1.o -xc++ <(echo 'namespace internal { int i; };')
> $ g++ -g -c -fPIC -o bar2.o -xc++ <(echo 'namespace internal { extern int i; }; int bar () { return internal::i; }')
> $ gcc -g -shared -o bar.so bar1.o bar2.o
> $ eu-readelf -sr -winfo foo.so bar.so
>
> Now imagine a program linking in both foo.so and bar.so. There are
> two different things that are both separate but equal and both truly
> internal::i and both truly _ZN8internal1iE. By any method, there is
> no one answer to, "What is internal::i?" The only answers are
> context-specific.
>
Proposed solution:
Teach the compiler to generate a DW_AT_location for a non defining
declaration that is applicable in that die's scope. That location
expression would be parallel to the assembly generated for the symbol
> The key is that you can have the same(ish) relocs using the same
> symbols in the code and DWARF as assembled. Then whatever happens
> in linking stages later should be the same[...]
So,
> For non-PIC code, the actual code looks like:
>
> movl _ZN8internal1iE(%rip), %eax
>
> and the DWARF bit could look like:
>
> .byte DW_OP_addr
> .quad _ZN8internal1iE
>
[...]
> These get resolved at link time to absolute addresses, et voila.
And,
> In a PIC access, what the final code will actually do is not really
> related to anything about ELF symbols. It's just memory indirection.
> The PIC code is:
>
> movq _ZN8internal1iE@GOTPCREL(%rip), %rax
> movl (%rax), %eax
>
[...]
> .byte DW_OP_addr
> .quad _ZN8internal1iE@GOT
> .byte DW_OP_deref
>
> This generates R_X86_64_GOT64. At link time, this too goes away and
> becomes the "absolute" address of the .got slot.
The following part I don't quite understand:
> We could certainly teach GCC to do this.
> It would then be telling us more pieces of direct truth about the code.
> Would that not be the best thing ever?
> Well, almost.
>
> First, what about a defining declaration in a PIC CU?
>
> In the abstract, a defining declaration can be considered as talking
> about two different things. One is its declarationhood, wherein it
> says that the containing scope has this name visible. For that
> purpose, it could reasonably be expected to be like a non-defining
> declaration: say how code in this scope accesses the variable--the
> truth about what's in the assembly code for any accesses in that CU.
> But the other thing is its definitionhood, wherein it says what data
> address contains the data cell and thus (optionally) implies what
> object file position holds the initializer image--another truth about
> what's in the assembly code for the definition in this CU.
>
> In non-PIC code, these two truths match. Both use direct address
> constants (as relocated at link time). But in PIC code, the truth
> about the definition is an address constant, while the truth about the
> access is an indirection through .got. (If you have PIC code that
> uses __attribute__((visibility("hidden"))) then it's direct access,
> though PC-relative, and thus "non-PIC" ("absolute") for DWARF
> purposes, so both truths match as in truly non-PIC code.)
>
> Personally, I would be all for having it both ways. In a CU where a
> defining declaration is actually used by PIC accesses, then you could
> generate a second non-defining declaration (even for C). Give it
> DW_AT_artificial, DW_AT_declaration, DW_AT_specification pointing to
> the defining declaration (in lieu of DW_AT_name, DW_AT_type, et al),
> and then DW_AT_location with the PIC style using indirection.
>
> With that, you could know that if you got a DW_AT_location from any
> DIE with DW_AT_declaration then you're done and have the real truth
> for accesses. If we presume no CUs from pre-apocalyptic compilers now
> that we are in these here end times, then we are finally free from
> ever having to rely on discerning the right ELF symbol from a name we
> surmised from DWARF (be it via DW_AT_MIPS_linkage_name or mangling).
>
Why is there a need for second artificial location describing die ? As I
understand it declarationhood is specified by the die's nesting in the
die hierarchy not its DW_AT_location. In other words, what is missing in
the current way gcc specifies locations for defining declarations ?
This summary does not include the part starting with "Before dynamic
linker startup" to the end of the email. Mainly because I am assuming
that the main use case is after dynamic linker startup.