This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 3/9] Handle TLS variable lookups when using separate debug object files.
On 2/5/19 12:06 PM, Simon Marchi wrote:
> On 2019-02-04 15:02, John Baldwin wrote:
>> On 2/2/19 7:52 AM, Simon Marchi wrote:
>>> On 2019-01-22 13:42, John Baldwin wrote:
>>>> The object files stored in the shared object list are the original
>>>> object files, not the separate debug object files. However, when
>>>> resolving a TLS variable, the variable is resolved against a separate
>>>> debug object file if it exists. In this case,
>>>> svr4_fetch_objfile_link_map was failing to find the link map entry
>>>> since the debug object file is not in its internal list, only the
>>>> original object file.
>>>
>>> Does this solve an existing issue, or an issue that would come up with
>>> the following patches?
>>
>> I only noticed while working on these patches, but I believe it is a
>> generic issue. I tried to reproduce on a Linux box by compiling a
>> small
>> library with separate debug symbols and a program that linked against
>> it
>> and running it under gdb, but TLS variables didn't work for me even
>> without
>> separate debug symbols in my testing. :(
>>
>> $ cat foo.c
>> #include "foo.h"
>>
>> static __thread int foo_id;
>>
>> void
>> set_foo_id(int id)
>> {
>> foo_id = id;
>> }
>>
>> int
>> get_foo_id(void)
>> {
>> return foo_id;
>> }
>> $ cat foo.h
>> void set_foo_id(int id);
>> int get_foo_id(void);
>> $ cat main.c
>> #include <stdio.h>
>>
>> #include "foo.h"
>>
>> int
>> main(void)
>> {
>>
>> set_foo_id(47);
>> printf("foo id = %d\n", get_foo_id());
>> return (0);
>> }
>> $ cc -g -fPIC -shared foo.c -o libfoo.so.full
>> $ objcopy --only-keep-debug libfoo.so.full libfoo.so.debug
>> $ objcopy --strip-debug --add-gnu-debuglink=libfoo.so.debug
>> libfoo.so.full libfoo.so
>> $ cc -g main.c -o foo -lfoo -L.
>> $ gdb foo
>> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
>> ...
>> (gdb) start
>> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
>> Starting program: /home/john/tls_lib/foo
>>
>> Temporary breakpoint 1, main () at main.c:9
>> 9 set_foo_id(47);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3970, shared library
>> libfoo.so:
>> Cannot find thread-local variables on this target
>>
>> Then tried it without separate debug file, but that didn't work either:
>>
>> $ cc -g -fPIC -shared foo.c -o libfoo.so
>> $ gdb foo
>> GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
>> ...
>> (gdb) start
>> Temporary breakpoint 1 at 0x7ae: file main.c, line 9.
>> Starting program: /home/john/tls_lib/foo
>>
>> Temporary breakpoint 1, main () at main.c:9
>> 9 set_foo_id(47);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3982, shared library
>> libfoo.so:
>> Cannot find thread-local variables on this target
>> (gdb) n
>> 10 printf("foo id = %d\n", get_foo_id());
>> (gdb)
>> foo id = 47
>> 11 return (0);
>> (gdb) p foo_id
>> Cannot find thread-local storage for process 3982, shared library
>> libfoo.so:
>> Cannot find thread-local variables on this target
>>
>> I would have expected the second case to work and the first to fail and
>> for
>> the patch to fix the first case.
>
> I did a similar test and hit the same message. I figured it's because
> you need to build with -pthread. With -pthread, GDB manages to print
> the value in both cases (with and without separate debug info). That's
> why I ended up asking you this question, I thought that maybe I just
> hadn't reproduced the issue correctly.
>
> I have attached my test case (so we don't have to rewrite the same test
> over and over), which you can easily build with and without debug info
> (see the Makefile). Does it trigger the problem on FreeBSD? Does it
> work for you in both cases on Linux, like it does for me?
>
> Now that I take a look, there might be the gdb.threads/tls-shared.exp
> test that does the exact same thing. But there doesn't seem to be a
> board file to test with separate debug files out of the box...
So, it works fine for me without this patch on FreeBSD. However, I dug
around a bit more as I had only noticed this before when trying to debug
a core dump for riscv (as FreeBSD/riscv64 core dumps have the tp register
in them). That is when I get a separate objfile passed to
svr4_fetch_objfile_link_map:
Breakpoint 3, svr4_fetch_objfile_link_map (objfile=0x8038a1600) at ../../gdb/solib-svr4.c:1541
1541 struct svr4_info *info = get_svr4_info ();
(top-gdb) p *objfile
$3 = {next = 0x8038a1480,
original_name = 0x8038df010 "/usr/lib/debug//ufs/riscv64/rootfs/lib/libc.so.7.debug", addr_low = 0x0, flags = {
...
(top-gdb) where
#0 svr4_fetch_objfile_link_map (objfile=0x8038a1600)
at ../../gdb/solib-svr4.c:1541
#1 0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x803800010, objfile=0x8038a1600) at ../../gdb/gdbarch.c:3019
#2 0x00000000019ba415 in target_translate_tls_address (objfile=0x8038a1600,
offset=0x1800) at ../../gdb/target.c:705
#3 0x00000000016ea2a1 in find_minsym_type_and_address (msymbol=0x803cfcb18,
objfile=0x8038a1600, address_p=0x7fffffffd2e8) at ../../gdb/parse.c:500
#4 0x00000000014afeb7 in evaluate_var_msym_value (noside=EVAL_NORMAL,
objfile=0x8038a1600, msymbol=0x803cfcb18) at ../../gdb/eval.c:743
#5 0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0,
exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL)
at ../../gdb/eval.c:1326
#6 0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x8034f0430,
pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704
#7 0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x8034f0430,
pos=0x7fffffffde34, noside=EVAL_NORMAL) at ../../gdb/eval.c:80
#8 0x00000000014ae48b in evaluate_expression (exp=0x8034f0430)
at ../../gdb/eval.c:141
#9 0x000000000173184f in print_command_1 (
exp=0x7fffffffe101 "__je_tsd_initialized", voidprint=1)
at ../../gdb/printcmd.c:1187
#10 0x000000000172e5ff in print_command (
exp=0x7fffffffe101 "__je_tsd_initialized", from_tty=1)
at ../../gdb/printcmd.c:1200
(top-gdb) frame 5
#5 0x00000000014b0290 in evaluate_subexp_standard (expect_type=0x0,
exp=0x8034f0430, pos=0x7fffffffde34, noside=EVAL_NORMAL)
at ../../gdb/eval.c:1326
1326 value *val = evaluate_var_msym_value (noside,
(top-gdb) l
1321 case OP_VAR_MSYM_VALUE:
1322 {
1323 (*pos) += 3;
1324
1325 minimal_symbol *msymbol = exp->elts[pc + 2].msymbol;
1326 value *val = evaluate_var_msym_value (noside,
1327 exp->elts[pc + 1].objfile,
1328 msymbol);
1329
1330 type = value_type (val);
whereas on a live amd64 process this was called from a different place:
(top-gdb) where
#0 svr4_fetch_objfile_link_map (objfile=0x803879380)
at ../../gdb/solib-svr4.c:1541
#1 0x00000000015085e7 in gdbarch_fetch_tls_load_module_address (gdbarch=0x8039b7010, objfile=0x803879380) at ../../gdb/gdbarch.c:3019
#2 0x00000000019ba3f5 in target_translate_tls_address (objfile=0x803879380,
offset=6168) at ../../gdb/target.c:705
#3 0x000000000141f361 in dwarf_evaluate_loc_desc::get_tls_address (
this=0x7fffffffc578, offset=6168) at ../../gdb/dwarf2loc.c:609
#4 0x00000000014077dd in dwarf_expr_context::execute_stack_op (
this=0x7fffffffc578, op_ptr=0x803dea6fb "\002S\177\001",
op_end=0x803dea6fb "\002S\177\001") at ../../gdb/dwarf2expr.c:1175
#5 0x00000000014056b1 in dwarf_expr_context::eval (this=0x7fffffffc578,
addr=0x803dea6f1 "\016\030\030", len=10) at ../../gdb/dwarf2expr.c:301
#6 0x000000000140e99d in dwarf2_evaluate_loc_desc_full (type=0x8040ad930,
frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190,
subobj_type=0x8040ad930, subobj_byte_offset=0)
at ../../gdb/dwarf2loc.c:2170
#7 0x000000000140e7d2 in dwarf2_evaluate_loc_desc (type=0x8040ad930,
frame=0x0, data=0x803dea6f1 "\016\030\030", size=10, per_cu=0x8039cb190)
at ../../gdb/dwarf2loc.c:2352
#8 0x0000000001412794 in locexpr_read_variable (symbol=0x80416c930, frame=0x0)
at ../../gdb/dwarf2loc.c:3503
#9 0x00000000014e3743 in default_read_var_value (var=0x80416c930,
var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:610
#10 0x00000000014e4881 in read_var_value (var=0x80416c930,
var_block=0x8041724e0, frame=0x0) at ../../gdb/findvar.c:815
#11 0x0000000001a99abe in value_of_variable (var=0x80416c930, b=0x8041724e0) at ../../gdb/valops.c:1292
#12 0x00000000014afd7b in evaluate_var_value (noside=EVAL_NORMAL, blk=0x8041724e0, var=0x80416c930) at ../../gdb/eval.c:721
#13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:1312
#14 0x00000000012a8562 in evaluate_subexp_c (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/c-lang.c:704
#15 0x00000000014ae17b in evaluate_subexp (expect_type=0x0, exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL) at ../../gdb/eval.c:80
#16 0x00000000014ae48b in evaluate_expression (exp=0x803507830) at ../../gdb/eval.c:141
#17 0x000000000173184f in print_command_1 (exp=0x7fffffffd7d1 "__je_tsd_initialized", voidprint=1) at ../../gdb/printcmd.c:1187
#18 0x000000000172e5ff in print_command (exp=0x7fffffffd7d1 "__je_tsd_initialized", from_tty=1) at ../../gdb/printcmd.c:1200
(top-gdb) frame 13
#13 0x00000000014b0217 in evaluate_subexp_standard (expect_type=0x0,
exp=0x803507830, pos=0x7fffffffd504, noside=EVAL_NORMAL)
at ../../gdb/eval.c:1312
1312 return evaluate_var_value (noside, exp->elts[pc + 1].block, var);
(top-gdb) l
1307 (*pos) += 3;
1308 symbol *var = exp->elts[pc + 2].symbol;
1309 if (TYPE_CODE (SYMBOL_TYPE (var)) == TYPE_CODE_ERROR)
1310 error_unknown_type (SYMBOL_PRINT_NAME (var));
1311 if (noside != EVAL_SKIP)
1312 return evaluate_var_value (noside, exp->elts[pc + 1].block, var);
1313 else
1314 {
1315 /* Return a dummy value of the correct type when skipping, so
1316 that parent functions know what is to be skipped. */
So it seems that the OP_VAR_VALUE path calls down into the dwarf bits that
get the "original" objfile to pass to target_translate_tls_address, whereas
the OP_VAR_MSYM_VALUE case ends up using a separate object file. This might
in fact be due to bugs in the RISCV GCC backend as the TLS symbols for RISCV
don't seem quite right. I have to cast TLS variables to their types for
example:
(gdb) p __je_tsd_initialized
'__je_tsd_initialized` has unknown type; cast it to its declared type
(gdb) p (bool)__je_tsd_initialized
$2 = 1 '\001'
Also, for me TLS variables in the main executable did not work for me on
RISCV, only TLS variables in shared libraries, unlike on other architectures
I tested where TLS variables in both the executable and shared libraries
worked.
--
John Baldwin