This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Patch for -relax bug
- To: <binutils at sourceware dot cygnus dot com>
- Subject: Patch for -relax bug
- From: "Kenneth Block" <block at zk3 dot dec dot com>
- Date: Fri, 8 Sep 2000 16:57:56 -0400
- Cc: "Kenneth Block (E-mail)" <block at zk3 dot dec dot com>
- Reply-To: <block at zk3 dot dec dot com>
There is a very short patch to fix this problem.
PROBLEM DESCRIPTION
==================
ld performs LITUSE relocations incorrectly if -relax is specified on the ld
command line and the displacement field of the load or store instruction is
non-zero. This results in incorrect results at runtime.
Consider this C program in two files (bug.c, bug2.c):
------------- bug.c ---------------------
extern void print();
struct {
char* x;
int y;
} a;
int main()
{
a.x = "hello";
a.y = 1;
print();
return 0;
}
----------- bug2.c ----------------------------------
#include <stdio.h>
extern struct {
char* x; int y;
} a;
void print()
{
printf("x = %s, y = %d\n", a.x, a.y);
}
-----------------------------------------------------
should produce the output:
x = hello, y = 1
and does so when built with the Compaq C compiler using the command lines:
ccc -c bug.c bug2.c
gcc -o bug bug.o bug2.o
(we're doing the link here with gcc so that you guys can reproduce the
problem without installing Compaq C-all you need is the two .o files) but if
it is built with the command lines:
ccc -c bug.c bug2.c
gcc -o bug bug.o bug2.o -relax
the result is a runtime segmentation fault.
ANALYSIS
=======
The debugger shows that the segmentation fault happens in printf() and is
because the low 32 bits of a.x are 0 instead of the correct value. This
happens because ld has incorrectly performed a LITUSE link-time
optimization. The output from objdump -d -r bug.o shows the following:
0000000000000000 <main>:
0: 00 00 bb 27 ldah gp,0(t12)
0: GPDISP *ABS*+0x4
4: 00 00 bd 23 lda gp,0(gp)
8: 00 00 5d a4 ldq t1,0(gp)
8: ELF_LITERAL a
c: 00 00 7d a7 ldq t12,0(gp)
c: ELF_LITERAL print
10: 00 00 7d a4 ldq t2,0(gp)
10: ELF_LITERAL .data
14: 05 34 e0 47 mov 0x1,t4
18: f0 ff de 23 lda sp,-16(sp)
1c: 00 00 62 b4 stq t2,0(t1)
1c: LITUSE a+0x1
20: 08 00 a2 b0 stl t4,8(t1)
20: LITUSE a+0x1
24: 00 00 5e b7 stq ra,0(sp)
28: 00 00 fe 2f unop
2c: 00 40 5b 6b jsr ra,(t12),30 <main+0x30>
2c: LITUSE print+0x3
2c: HINT print
30: 00 00 ba 27 ldah gp,0(ra)
30: GPDISP *ABS*+0x8 34: 00 00 5e a7 ldq ra,0(sp)
38: 00 00 bd 23 lda gp,0(gp)
3c: 00 04 ff 47 clr v0
40: 10 00 de 23 lda sp,16(sp)
44: 01 80 fa 6b ret zero,(ra),0x1
The instructions involved in the initialization of the two fields of struct
a are:
8: 00 00 5d a4 ldq t1,0(gp) // get address of struct a
8: ELF_LITERAL a
10: 00 00 7d a4 ldq t2,0(gp) // get address of "hello"
10: ELF_LITERAL .data
14: 05 34 e0 47 mov 0x1,t4 // get literal value 1
1c: 00 00 62 b4 stq t2,0(t1) // initialize a.x
1c: LITUSE a+0x1
20: 08 00 a2 b0 stl t4,8(t1) // initialize a.y
20: LITUSE a+0x1
Note that the STL that sets size to 0 has a displacement of 8 and that both
the STQ and STL use the same GOT entry.
When -relax is not specified, objdump -d -r of the executable image shows
the expected code:
0000000120000400 <main>:
120000400: 11 00 bb 27 ldah gp,17(t12)
120000404: c8 81 bd 23 lda gp,-32312(gp)
120000408: 30 80 5d a4 ldq t1,-32720(gp)
12000040c: 00 80 7d a7 ldq t12,-32768(gp)
120000410: 50 80 7d a4 ldq t2,-32688(gp)
120000414: 05 34 e0 47 mov 0x1,t4
120000418: f0 ff de 23 lda sp,-16(sp)
12000041c: 00 00 62 b4 stq t2,0(t1)
120000420: 08 00 a2 b0 stl t4,8(t1)
120000424: 00 00 5e b7 stq ra,0(sp)
120000428: 00 00 fe 2f unop
12000042c: 08 40 5b 6b jsr ra,(t12),120000450 <print> 120000430: 11 00 ba
27 ldah gp,17(ra)
120000434: 00 00 5e a7 ldq ra,0(sp)
120000438: 98 81 bd 23 lda gp,-32360(gp)
12000043c: 00 04 ff 47 clr v0
120000440: 10 00 de 23 lda sp,16(sp)
120000444: 01 80 fa 6b ret zero,(ra),0x1
However, if -relax is specified objdump -d -r shows this code in the
executable image:
0000000120000400 <main>:
120000400: 11 00 bb 27 ldah gp,17(t12)
120000404: c8 81 bd 23 lda gp,-32312(gp)
120000408: 00 00 e0 2f unop
12000040c: 00 00 e0 2f unop
120000410: 40 80 7d a4 ldq t2,-32704(gp)
120000414: 05 34 e0 47 mov 0x1,t4
120000418: f0 ff de 23 lda sp,-16(sp)
12000041c: a0 81 7d b4 stq t2,-32352(gp)
120000420: a0 81 bd b0 stl t4,-32352(gp)
120000424: 00 00 5e b7 stq ra,0(sp)
120000428: 00 00 fe 2f unop
12000042c: 0a 00 40 d3 bsr ra,120000458 <print+0x8> 120000430: 11 00 ba 27
ldah gp,17(ra)
120000434: 00 00 5e a7 ldq ra,0(sp)
120000438: 98 81 bd 23 lda gp,-32360(gp)
12000043c: 00 04 ff 47 clr v0
120000440: 10 00 de 23 lda sp,16(sp)
120000444: 01 80 fa 6b ret zero,(ra),0x1
ld has recognized that symbol a is within 32K of the GP value and therefore
the LDQ of register t1 can be turned into a NOP and the stores can be done
directly into offsets off of GP. Unfortunately, though, the computed offset
on the STL has failed to take into account the 8 that was in the
displacement field of the original instruction. The result is that the STL
stores to the same address as the STQ, thereby overwriting the low 32 bits
of the value stored by the STQ with zeros. This results in the segmentation
fault at run-time.
This doesn't happen with gcc (at least not on this program) because gcc
loads the base address of a twice, once for each of the assignments to the
members of the struct. The gcc code for the assignments is one instruction
longer than the ccc code.
Examining the source file elf64-alpha.c shows that the problem is in routine
elf64_alpha_relax_with_lituse. The code for optimizing a LITUSE on a
memory-format instruction that has 16-bit displacements reads:
case 1: /* MEM FORMAT */
/* We can always optimize 16-bit displacements. */ if (fits16)
{
/* FIXME: sanity check the insn for mem format with
zero addend. */
/* Take the op code and dest from this insn, take the base
register from the literal insn. Leave the offset alone. */
insn = (insn & 0xffe00000) | (lit_insn & 0x001f0000);
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;
bfd_put_32 (info->abfd, insn, info->contents + urel->r_offset);
info->changed_contents = true;
}
Note that this line:
insn = (insn & 0xffe00000) | (lit_insn & 0x001f0000);
has the side effect of zeroing out the low 16 bits (the displacement field)
of insn. I think we have run afoul of the FIXME comment here- the code for
doing memory-format LITUSE optimizations assumes that all of the LITUSEs
corresponding to a LITERAL have zero displacements.
If this is not the case, there are two problems with the code:
1) As shown by our bug reproducer, the displacement field of the instruction
with the LITUSE is not preserved by the optimization-it is zeroed.
2) The booleans fits16 and fits32, which tell whether the 16-bit (and
corresponding 32-bit) optimizations can be performed, are computed once
outside the loop through the LITUSEs. This is OK if all displacements are
zero but not valid if this is not the case-it's possible for the
displacement to put the distance from GP outside the range permitted for the
optimization. fits16 and fits32 must take into account the displacement and
they must be calculated for each LITUSE that's examined.
Here is a suggested correction to the code (warning: I haven't tested it):
-----------------------------------------------------------------------
/* A little preparation for the loop... */
disp = symval - info->gp;
for (urel = irel+1, i = 0; i < count; ++i, ++urel)
{
unsigned int insn;
int insn_disp;
bfd_signed_vma xdisp;
insn = bfd_get_32 (info->abfd, info->contents + urel->r_offset); switch
(urel->r_addend)
{
default: /* 0 = ADDRESS FORMAT */
/* This type is really just a placeholder to note that all
uses cannot be optimized, but to still allow some. */ all_optimized =
false; break;
case 1: /* MEM FORMAT */
/* We can always optimize 16-bit displacements. */
/* extract the displacement from the instruction,
sign-extending it if necessary, then test whether it
is with 16 or 32 bits of GP */
insn_disp = insn & 0x0000ffff;
if (insn_disp & 0x00008000)
insn_disp |= 0xffff0000; /* negative; sign-extend */
xdisp = disp + insn_disp; fits16 = (xdisp >= -(bfd_signed_vma)0x8000 &&
xdisp < 0x8000); fits32 = (xdisp >= -(bfd_signed_vma)0x80000000 && xdisp <
0x7fff8000); if (fits16)
{
/* Take the op code and dest from this insn, take the base
register from the literal insn. Leave the offset alone. */
insn = (insn & 0xffe0ffff) | (lit_insn & 0x001f0000);
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;
bfd_put_32 (info->abfd, insn, info->contents + urel->r_offset);
info->changed_contents = true;
}
/* If all mem+byte, we can optimize 32-bit mem displacements. */ else if
(fits32 && !(flags & ~6))
{
/* FIXME: sanity check that lit insn Ra is mem insn Rb */
irel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELHIGH);
lit_insn = (OP_LDAH << 26) | (lit_insn & 0x03ff0000);
bfd_put_32 (info->abfd, lit_insn,
info->contents + irel->r_offset); lit_reused = true; info->changed_contents
= true;
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;
}
else
all_optimized = false; break;
----------------------------------------------------------------------
REPRODUCING THE PROBLEM
======================
Link the supplied object files bug.o and bug2.o using the gcc command:
gcc -o good bug.o bug2.o
gcc -o bad bug.o bug2.o -relax
Executable image good produces the expected results.
Executable image bad gets a segmentation fault.
[End of problem report]
ld_relax.patch
bug.o
bug2.o
Changelog