This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GDB internal error in pc_in_thread_step_range


> Date: Wed, 19 Dec 2018 19:16:15 -0500
> From: Simon Marchi <simon.marchi@polymtl.ca>
> Cc: gdb-patches@sourceware.org
> 
> >   (top-gdb) p msymbol
> >   $3 = {minsym = 0x10450d38, objfile = 0x10443b48}
> >   (top-gdb) p msymbol.minsym.mginfo.name
> >   $4 = 0x104485cd "__register_frame_info"
> >   (top-gdb) p msymbol.minsym.mginfo
> >   $5 = {name = 0x104485cd "__register_frame_info", value = {ivalue = 0,
> >       block = 0x0, bytes = 0x0, address = 0x0, common_block = 0x0,
> >       chain = 0x0}, language_specific = {obstack = 0x0, demangled_name 
> > = 0x0},
> >     language = language_auto, ada_mangled = 0, section = 0}
> 
> Ok.  Well this is already strange.  Why is there an mst_text (code) 
> symbol with a value of 0?

Its address is zero because it's an unresolved symbol:

  d:\usr\eli>nm -A hello0.exe | fgrep " U "
  hello0.exe:         U ___deregister_frame_info
  hello0.exe:         U ___register_frame_info
  hello0.exe:         U __Jv_RegisterClasses

This symbol comes from a weak symbol defined in MinGW crtbegin.o:

  d:\usr\eli>nm -A lib/gcc/mingw32/6.3.0/crtbegin.o | fgrep _frame_info
  lib/gcc/mingw32/6.3.0/crtbegin.o:00000000 A .weak.___deregister_frame_info.___EH_FRAME_BEGIN__
  lib/gcc/mingw32/6.3.0/crtbegin.o:00000000 A .weak.___register_frame_info.___EH_FRAME_BEGIN__
  lib/gcc/mingw32/6.3.0/crtbegin.o:         w ___deregister_frame_info
  lib/gcc/mingw32/6.3.0/crtbegin.o:         w ___register_frame_info

> If your binary is anything like those I can 
> produce with x86_64-w64-mingw32-gcc (and it looks similar, given the 
> addresses you show), your "image base" is likely 0x400000, and "base of 
> code" 0x1000 (0x401000 in absolute).  I found this information using 
> "objdump -x", in the header somewhere.  I therefore expect all text 
> symbols to be >= 0x401000.  I would start digging why this text symbol 
> with a value of 0 exists.

See above.  But please note that I use mingw.org's MinGW, and my
executables are 32-bit, whereas you use MinGW64 and 64-bit
executables.  So some details might be different; in particular, I
don't think MinGW64 has this problematic symbol, because it's specific
to the DWARF2 exception unwinding implemented in libgcc, which 64-bit
Windows executables don't use.

> It would be interesting to look at some other symbols in the msymbols 
> vector.  Are the other mst_text symbols >= 0x401000?

There are 2 more unresolved mst_text symbols, see above; they all have
a zero address.  All the others are above 0x401000, indeed.

The lowest-address resolved minimal symbol whose type is mst_text is
this:

  (top-gdb) p msymbol[22]
  $112 = {mginfo = {name = 0x10447d95 "_mingw32_init_mainargs", value = {
	ivalue = 4199072, block = 0x4012a0 <_mingw32_init_mainargs>,
	bytes = 0x4012a0 <_mingw32_init_mainargs> "Æ\222?<\215D$,\307D$\004",
	address = 0x4012a0, common_block = 0x4012a0 <_mingw32_init_mainargs>,
	chain = 0x4012a0 <_mingw32_init_mainargs>}, language_specific = {
	obstack = 0x0, demangled_name = 0x0}, language = language_auto,
      ada_mangled = 0, section = 0}, size = 0, filename = 0x0, type = mst_text,
    created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size = 0,
    hash_next = 0x0, demangled_hash_next = 0x0}

Interestingly, objdump shows this symbol in section 1:

  [  0](sec  1)(fl 0x00)(ty  20)(scl   2) (nx 0) 0x000002a0 __mingw32_init_mainargs

whereas the above minsym information shows section = 0.  Is this
expected?  If "real" symbols were to have section > 0, we could
perhaps reject the unresolved ones.

> Assuming this minimal symbol is wrong and assuming it wasn't there, then 
> I guess the search would fail and we would fall in the "Cannot find 
> bounds of current function" case of prepare_one_step?  That would be 
> appropriate in this case.

It's not wrong, but perhaps lookup_minimal_symbol_by_pc_section should
reject unresolved symbols for this purpose.  However, the question is
how?  One possibility is by their zero address.  (I don't see the weak
attribute, or any other indication of its being unresolved, in the
minimal symbol attributes.)

In any case, if we do call the "Cannot find bounds of current
function" error, that will throw to the command loop, which I think is
undesirable in this case.  We want GDB to step out of this code, not
to error out.

> Ok, from what I understand, all these "mst_abs" symbols do not represent 
> addresses.  They just represent numerical "values", like version 
> numbers, alignment sizes, etc.  So it seems right to skip them when 
> looking for the minimal symbol preceding pc.
> 
> It looks like minimal_symbol_upper_bound is buggy, in that it should not 
> consider these mst_abs.  If we are looking for the end of a memory 
> range, we should not consider those symbols that do not even represent 
> memory addresses...

Indeed, the following change is enough to avoid the internal error:

--- gdb/minsyms.c~0	2018-07-04 18:41:59.000000000 +0300
+++ gdb/minsyms.c	2018-12-20 08:06:11.516834500 +0200
@@ -1514,7 +1514,8 @@ minimal_symbol_upper_bound (struct bound
     {
       if ((MSYMBOL_VALUE_RAW_ADDRESS (msymbol + i)
 	   != MSYMBOL_VALUE_RAW_ADDRESS (msymbol))
-	  && MSYMBOL_SECTION (msymbol + i) == section)
+	  && MSYMBOL_SECTION (msymbol + i) == section
+	  && MSYMBOL_TYPE (msymbol + i) != mst_abs)
 	break;
     }
 
However, it still shows the incorrect function name from the
zero-address symbol:

  7       }
  (gdb) n
  0x00401288 in __register_frame_info ()
  (gdb) n
  Single stepping until exit from function __register_frame_info,
  which has no line number information.
  [Inferior 1 (process 10424) exited normally]

I think if we want to avoid showing __register_frame_info, we need
further changes in lookup_minimal_symbol_by_pc_section.  But I don't
see how this will help us, unless we also allow displaying the above
message for functions whose names we don't know, perhaps saying
something like

  Single stepping until exit from function <unknown>

> > That's what I did.  The problem seems to be that the low value of PC
> > doesn't allow GDB to find a reasonable symbol; what it finds are
> > symbols with very low addresses, which don't look like symbols
> > relevant to the issue at hand.  I see the same symbols and addresses
> > in the output of "objdump -t" (I can show it if you want).
> 
> If you could pastebin it, or send it as an attachment, I think it would 
> be useful.  Consider sending the output of "objdump -x", which I think 
> gives a superset of "objdump -t".

Attached.

> > Where do we go from here?
> 
> I would say
> 
> 1. investigate if the text symbol at address 0 really has business being 
> there.

Done.

> 2. investigate if there should be some text symbol that should really 
> contain 0x0040126d, that for some reason does not end up in GDB's 
> minimal symbol table.

The function in which the PC value of 0x401288 lives is
__mingw_CRTStartup, which ends like this:

  /* Call the main() function. If the user does not supply one
   * the one in the 'libmingw32.a' library will be linked in, and
   * that one calls WinMain().  See main.c in the 'lib' directory
   * for more details.
   */
  nRet = main (_argc, _argv, environ);

  /* Perform exit processing for the C library. This means flushing
   * output and calling atexit() registered functions.
   */
  _cexit ();

  ExitProcess (nRet);
}

This function is declared in the MinGW runtime sources as follows:

  static __MINGW_ATTRIB_NORETURN void __mingw_CRTStartup (void);

But its symbol is not in the symbol table.  Not sure why, perhaps
because it's a static function?  But the code is there, starting at
the address 0x4011b0.  The last part, after exiting 'main', which
corresponds to the above source snippet is this:

  (gdb) disassemble 0x401283,0x401294
  Dump of assembler code from 0x401283 to 0x401294:
     0x00401283 <__register_frame_info+4199043>:  call   0x401460 <main>
     0x00401288 <__register_frame_info+4199048>:  mov    %eax,%ebx
     0x0040128a <__register_frame_info+4199050>:  call   0x403a90 <_cexit>
     0x0040128f <__register_frame_info+4199055>:  mov    %ebx,(%esp)
     0x00401292 <__register_frame_info+4199058>:  call   0x403b28 <ExitProcess@4>

So when this problem happens, we are at the "mov %eax,%ebx"
instruction after exiting 'main', as I'd expect.

Attachment: objdump_x.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]