This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: Strange ld.gold segmentation error issues.
- From: Cary Coutant <ccoutant at google dot com>
- To: "ISHIKAWA,chiaki" <ishikawa at yk dot rim dot or dot jp>
- Cc: Binutils <binutils at sourceware dot org>
- Date: Mon, 2 Jun 2014 11:30:40 -0700
- Subject: Re: Strange ld.gold segmentation error issues.
- Authentication-results: sourceware.org; auth=none
- References: <537A2682 dot 40706 at yk dot rim dot or dot jp> <538A0A46 dot 2030102 at yk dot rim dot or dot jp>
> Oh, I am using -gsplit-dwarf switch to gcc, g++, and
> pass -gdb-index to gold. Does it matter? (with the older version of
> GNU gold, it did not cause this segmentation error.)
The crash is in the code that creates the .gdb_index section, so that
option is definitely significant.
> Initially, I suspected that it could be an OOM issue since there are
> many processes running "make -j4 ...", but then I found I have more
> than 2.5 GB of main memory is free at the time the GNU gold binary was
> invoked to produce a .so library just before the segfault occurs The
> library is NOT THAT big.
>
> Also, to my surprise, changing "-j4" make switch to "-j3" did not
> change the issue. Even with less number of processes invoked by make,
> the segfault still occurred. So OOM is unlikely, and come to think of
> it, if OOM had happened, the kernel should have recorded it, but I did
> not see such messages in kernel logs.
>
> Anyway, so, I modified my local version of "ld" to invoke a shell
> script, which
> checks if the particular library (here libnspr4.so) is going to be
> created, and if so, invokes ld.new (gold binary) under gdb and see
> what happens. Otherwise it simply invokes gold binary with the passed
> arguments.
> (In the previous posting, I thought it was libmozalloc.so that caused
> the blowup, but as it turned out it is the next target, libnspr4.so)
>
>
> Funny, the first few times, it did not trip ?!??
> Maybe I was doing something wrong.
> On the third try, I could capture the stacktrace.
This is puzzling. I'd think a problem like this would be reproducible,
unless there's some sort of race going on with the .o files. Does the
problem go away if you change make to use "-j1"?
> [I obtained three dumps.
> One with my stock ~/.gdbinit tailored to mozilla thunderbird
> debugging. But it contained a set of spurious warnings related to
> files referenced in .gdbinit.
> The 2nd ONE was obtained after this .gdbinit file renamed to .gdbinit.save
> to remove the spurious warning.
> The 3rd one was obtained after I cleared ccache completely.
> I cleared ccache's cache to make sure
> that I am not using corrupt object files (for some mysterious
> reason). I use a version of ccache enhanced to support -gsplit-dwarf.
> https://bitbucket.org/zephyrus00jp/ccache-gsplit-dwarf-support
> https://bugzilla.samba.org/show_bug.cgi?id=10005
>
> The second and third stack trace matched completely (except for the
> process ID that is printed at the end.) So I am sure ccache is not
> involved with the problem.
> So I am showing the 3rd dump below.
>
> Funny thing is that I can re-invoke top-most make -f client.mk with
> suitable environment variable setting, etc., and can create a working
> mozilla thunderbird (!?) I wonder in what condition the left over
> libnspr4.so is. Maybe the link/build system of mozilla thunderbird is
> clever enough to figure out that libnspr4.a is used instead(?), but I
> digress.
Since the linker is crashing early during the first pass, it will not
have even created the output file yet, so you are probably left with
an older copy left over from a link that did not crash.
> Program received signal SIGSEGV, Segmentation fault.
> gold::Gdb_index::add_symbol (this=0x901e90, cu_index=3,
> sym_name=0x2aaaaaaec000 <Address 0x2aaaaaaec000 out of bounds>,
> flags=0 '\000') at gdb-index.cc:1128
> 1128 reinterpret_cast<const unsigned char*>(sym_name));
> (gdb) #0 gold::Gdb_index::add_symbol (this=0x901e90, cu_index=3,
> sym_name=0x2aaaaaaec000 <Address 0x2aaaaaaec000 out of bounds>,
> flags=0 '\000') at gdb-index.cc:1128
> #1 0x0000000000517602 in gold::Gdb_index_info_reader::read_pubtable (
> this=0x7fffffff5a30, table=0x9022d0, offset=<optimized out>)
> at gdb-index.cc:879
This is definitely helpful -- thanks for going through so much trouble
to get these stack traces. This shows that we are in the middle of
hashing a name from the .debug_pubnames (or .debug_gnu_pubnames)
table, but for some reason we have a name that runs off the end of the
table with no null-termination. That should not happen, and suggests a
corrupt .o file. It would be helpful to figure out which .o file we're
reading at this point, but I'll need you to do a bit more to collect
that...
> #2 0x00000000005176c9 in
> gold::Gdb_index_info_reader::read_pubnames_and_pubtypes
> (this=0x7fffffff5a30, die=0x7fffffff5960) at gdb-index.cc:942
> #3 0x0000000000518009 in gold::Gdb_index_info_reader::visit_top_die (
> this=0x7fffffff5a30, die=0x7fffffff5960) at gdb-index.cc:379
> #4 0x00000000005180d3 in
> gold::Gdb_index_info_reader::visit_compilation_unit
> (this=0x7fffffff5a30, cu_offset=<optimized out>,
> cu_length=<optimized out>, root_die=<optimized out>) at gdb-index.cc:326
> #5 0x000000000062a8f2 in gold::Dwarf_info_reader::do_parse<false> (
> this=this@entry=0x7fffffff5a30) at dwarf_reader.cc:1363
> #6 0x000000000062746e in gold::Dwarf_info_reader::parse (
> this=this@entry=0x7fffffff5a30) at dwarf_reader.cc:1234
> #7 0x00000000005187b1 in gold::Gdb_index::scan_debug_info (this=0x901e90,
> is_type_unit=is_type_unit@entry=false, object=object@entry=0x946f90,
> symbols=0x2aaaaaaeb150 "", symbols@entry=0xb <Address 0xb out of
> bounds>,
> symbols_size=symbols_size@entry=504, shndx=<optimized out>,
> reloc_shndx=9, reloc_type=4) at gdb-index.cc:1119
> #8 0x0000000000550939 in gold::Layout::add_to_gdb_index<64, false> (
> this=this@entry=0x7fffffff6f30, is_type_unit=is_type_unit@entry=false,
> object=object@entry=0x946f90, symbols=0xb <Address 0xb out of bounds>,
> symbols@entry=0x2aaaaaaeb150 "", symbols_size=symbols_size@entry=504,
> shndx=<optimized out>, reloc_shndx=9, reloc_type=4) at layout.cc:1569
In frame #8, the value of object->name_ would tell you which .o file
it's reading. If you can find this and send me a copy of that .o file,
I'd like to take a look at it. (Since you say this is actually a
fairly small link, you could just send me all the .o files.)
-cary