This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: Strange ld.gold segmentation error issues.
- From: "ISHIKAWA,chiaki" <ishikawa at yk dot rim dot or dot jp>
- To: Cary Coutant <ccoutant at google dot com>, Binutils <binutils at sourceware dot org>
- Date: Tue, 10 Jun 2014 22:01:47 +0900
- Subject: Re: Strange ld.gold segmentation error issues.
- Authentication-results: sourceware.org; auth=none
- References: <537A2682 dot 40706 at yk dot rim dot or dot jp> <538A0A46 dot 2030102 at yk dot rim dot or dot jp> <CAHACq4qrW+so6g5mvU3hrNgxAJWRerfDrCB=ceECJyadsJQeaA at mail dot gmail dot com> <538F30D0 dot 1010409 at yk dot rim dot or dot jp> <538F344E dot 9020404 at yk dot rim dot or dot jp> <53906D48 dot 3050305 at yk dot rim dot or dot jp> <539136A1 dot 8080507 at yk dot rim dot or dot jp> <CAHACq4pMRoU_dijhkbeXfhMGd=Avmexzx4kurDJ=fLRPd_Xkrw at mail dot gmail dot com>
Dear Cary,
(2014/06/10 6:44), Cary Coutant wrote:
==4001==
==4001== Conditional jump or move depends on uninitialised value(s)
==4001== at 0x4017777: strlen (rtld-strlen.S:65)
==4001== by 0x40050BA: fillin_rpath (dl-load.c:492)
==4001== by 0x4007C95: _dl_init_paths (dl-load.c:866)
==4001== by 0x4002BB9: dl_main (rtld.c:1344)
==4001== by 0x4015214: _dl_sysdep_start (dl-sysdep.c:249)
==4001== by 0x40049F5: _dl_start (rtld.c:332)
==4001== by 0x4001187: ??? (in /lib/x86_64-linux-gnu/ld-2.18.so)
==4001== by 0x60: ???
==4001== by 0xFFEFFF66E: ???
==4001== by 0xFFEFFF67E: ???
==4001== by 0xFFEFFF68A: ???
==4001== by 0xFFEFFF695: ???
==4001==
==4001== Invalid read of size 1
==4001== at 0x517240: gold::Gdb_index::add_symbol(int, char const*,
unsigned char) (gdb-index.cc:164)
This memcheck output really isn't telling us anything new -- this is
right where the segfault is happening.
Because of the intermittent appearance of the bug under normal
circumstances, it was a pleasant surprise that execution under memgrind
always produced segfault. I think the particular mmap layout, etc.
caused by memcheck triggered the issue as you explained below.
The object file you sent me does appear to have a problem, and the
gold linker isn't being careful enough to detect the corruption. The
problem is with the .debug_pubnames section:
[11] .debug_pubnames PROGBITS 0000000000000000 00023d
00041b 00 0 0 1
Note that its length is 0x41b (1051 bytes). readelf -wp shows this:
Contents of the .debug_pubnames section:
Length: 12506
Version: 2
Offset into .debug_info section: 0x0
Size of area in .debug_info section: 2522
Offset Name
242 PR_FAILURE
24f PR_SUCCESS
289 IPPROTO_IP
296 IPPROTO_HOPOPTS
2a8 IPPROTO_ICMP
...
Note that the unit_length field is given as 12506 bytes. Gold should
have checked that the length given there was no larger than the number
of bytes remaining in the section, but doesn't, and it runs off the
end of the section, trying to add more names than there actually are
in the table. Under the wrong conditions, this could run off the end
of an mmap'ed region, which explains why it only sometimes crashes.
I understand the root cause of the issue finally now.
You weren't seeing this problem with older versions of gold because
the older versions would stop reading the pubnames table as soon as
they saw a pubnames table entry with a DIE offset == 0. But with
-gsplit-dwarf and -fdebug-types-sections, it's possible to have
pubnames table entries where we have no valid DIE offset to output, so
GCC outputs those entries with a 0 offset, and gold needs to keep on
reading, or it will fail to add all the names to the GDB index.
Hmm. Complex interaction of options and GCC's failure to do something
proper in this case, I suppose then.
I'll commit a fix to gold that will prevent it from running past the
end of the section, and I'll look into the GCC bug (it may already
have been fixed in a more recent version of GCC).
By the way, for best results with split DWARF, I recommend using GCC
4.9, which generates more information in the pubnames tables (now
named .debug_gnu_pubnames/pubtypes). The extra information allows gold
to generate a better .gdb_index, which lets GDB run faster.
I will try to use GCC 4.9 then. Debian GNU/Linux seems to have make that
version available in the last few weeks, but I wanted to track down the
segfault issue and so stick to 4.8 until now.
Thanks for taking the time to collect so much information about what's
going wrong! It helped quite a bit in tracking this down.
I am very happy to be of any help here.
Like I said, GNU gold has really improved the linking speed and
its memory footprint is much smaller when we link mozilla's thunderbird
(a very large C++ program). Using GNU gold is the only possible way to
link it under 32-bit address space with all the debug symbols, etc.
(Ordinary ld runs out of memory space during linking! When I hit this
ceiling, someone suggested using GNU gold, and I have been using it for
the last 12 months or so.)
Happy Hacking
Chiaki Ishikawa
-cary