This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/9] Handle TLS variable lookups when using separate debug object files.


On 2019-02-05 17:33, John Baldwin wrote:
On 2/5/19 2:21 PM, John Baldwin wrote:
So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that get the "original" objfile to pass to target_translate_tls_address, whereas the OP_VAR_MSYM_VALUE case ends up using a separate object file. This might in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV don't seem quite right. I have to cast TLS variables to their types for
example:

(gdb) p __je_tsd_initialized
'__je_tsd_initialized` has unknown type; cast it to its declared type
(gdb) p (bool)__je_tsd_initialized
$2 = 1 '\001'

Also, for me TLS variables in the main executable did not work for me on RISCV, only TLS variables in shared libraries, unlike on other architectures I tested where TLS variables in both the executable and shared libraries
worked.

I should have said: I think this means I probably need to rework the commit
message a bit to explain that this doesn't always happen with separate
debug files, but it can, and if so this fix is needed.

After discussing with John on IRC and trying by all means to trigger this bug, this is what I ended up with. On x86/GNU/Linux, it is possible to trigger this bug in a rather contrived way, but at least it shows that there is indeed a bug in GDB. The requirements are:

1. Have a TLS variable in a shared library ("libfoo.so")
2. The .o containing the TLS variable should not be described with DWARF. This can be done simply by compiling it without -g. 3. The shared lib should still have a separate debug info file ("libfoo.so.debug"). 4. The shared lib should be stripped of its unnecessary ELF symbols (ran through the strip utility) 5. The shared library should not be the first object file in the program to have TLS. This can be done by adding a TLS variable in the main executable ("main"). 6. Because we don't have debug info for the variable we try to print, we need to cast it to its expected type, e.g. "print (int)foo_id".

With all this, when parsing "(int)foo_id", we find a minimal symbol matching foo_id in the objfile representing libfoo.so.debug. find_minsym_type_and_address is eventually called with that objfile, which calls target_translate_tls_address, which calls svr4_fetch_objfile_link_map. The latter obviously can't find a link_map matching the separate-debug-info objfile, and wrong things happen after.

Without #2 above, the DWARF symbol is found and some other code path is taken, where the separate-debug-info objfile is replaced with the "real" objfile at some point.

Without #3, we obviously wouldn't end up with a separate-debug-info objfile in svr4_fetch_objfile_link_map.

Without #4 above, we would find the objfile for libfoo.so first, so we wouldn't end up with the seaprate-debug-info objfile in svr4_fetch_objfile_link_map.

Without #5 we actually get the right result by chance. This is because we end up in the special case "else" of thread_db_target::get_thread_local_address. That special case hardcodes the module id to read from to 1. If the shared lib is the only module to have TLS, then it happens to be the right one. By making the main executable have some TLS, we will end up reading the TLS for the wrong module, and thus trigger the bug.

I have not yet found the motivation to write a proper test for this (in particular, I am not sure how to build the lib with separate debug info in the testsuite). But I attached a reproducer for future reference. You should just need to "make" and "gdb -x run.gdb". The wrong value is printed with today's GDB.

In John's specific case, apparently the DWARF debug info for TLS variables on riscv is broken, which brought him to roughly the same state. I don't have any more info about that.

All this to say that I think this patch is fine. I wondered whether it was better instead to make svr4_fetch_objfile_link_map assert that it receives a "real" objfile (objfile->separate_debug_objfile_backlink == NULL) and push the responsibility to the caller to provide a correct objfile. But in the end I didn't find a compelling to do this rather than what John did in this patch. So therefore, the patch LGTM.

Simon


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]