This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Patch for -relax bug


There is a very short patch to fix this problem.

PROBLEM DESCRIPTION
==================

ld performs LITUSE relocations incorrectly if -relax is specified on the ld
command line and the displacement field of the load or store instruction is
non-zero.  This results in incorrect results at runtime.
Consider this C program in two files (bug.c, bug2.c):
------------- bug.c ---------------------
extern void print();
struct {
char* x;
int y;
} a;

int main()
{
a.x = "hello";
a.y = 1;
print();
return 0;
}
----------- bug2.c ----------------------------------
#include <stdio.h>

extern struct {
char* x; int y;
} a;
void print()
{
printf("x = %s, y = %d\n", a.x, a.y);
}
-----------------------------------------------------

should produce the output:
x = hello, y = 1
and does so when built with the Compaq C compiler using the command lines:
ccc -c bug.c bug2.c
gcc -o bug bug.o bug2.o

(we're doing the link here with gcc so that you guys can reproduce the
problem without installing Compaq C-all you need is the two .o files) but if
it is built with the command lines:
ccc -c bug.c bug2.c
gcc -o bug bug.o bug2.o -relax

the result is a runtime segmentation fault.

ANALYSIS
=======

The debugger shows that the segmentation fault happens in printf() and is
because the low 32 bits of a.x are 0 instead of the correct value.  This
happens because ld has incorrectly performed a LITUSE link-time
optimization.  The output from objdump -d -r bug.o shows the following:
0000000000000000 <main>:
0: 00 00 bb 27  ldah gp,0(t12)
0: GPDISP  *ABS*+0x4
4: 00 00 bd 23  lda  gp,0(gp)
8: 00 00 5d a4  ldq  t1,0(gp)
8: ELF_LITERAL a
c: 00 00 7d a7  ldq  t12,0(gp)
c: ELF_LITERAL print
10: 00 00 7d a4  ldq  t2,0(gp)
10: ELF_LITERAL .data
14: 05 34 e0 47  mov  0x1,t4
18: f0 ff de 23  lda  sp,-16(sp)
1c: 00 00 62 b4  stq  t2,0(t1)
1c: LITUSE a+0x1
20: 08 00 a2 b0  stl  t4,8(t1)
20: LITUSE a+0x1
24: 00 00 5e b7  stq  ra,0(sp)
28: 00 00 fe 2f  unop
2c: 00 40 5b 6b  jsr  ra,(t12),30 <main+0x30>
2c: LITUSE print+0x3
2c: HINT print
30: 00 00 ba 27  ldah gp,0(ra)
30: GPDISP *ABS*+0x8 34: 00 00 5e a7  ldq  ra,0(sp)
38: 00 00 bd 23  lda  gp,0(gp)
3c: 00 04 ff 47  clr  v0
40: 10 00 de 23  lda  sp,16(sp)
44: 01 80 fa 6b  ret  zero,(ra),0x1

The instructions involved in the initialization of the two fields of struct
a are:
8: 00 00 5d a4  ldq  t1,0(gp)    // get address of struct a
8: ELF_LITERAL a
10: 00 00 7d a4  ldq  t2,0(gp)    // get address of "hello"
10: ELF_LITERAL .data
  14: 05 34 e0 47  mov  0x1,t4      // get literal value 1
  1c: 00 00 62 b4  stq  t2,0(t1)    // initialize a.x
1c: LITUSE a+0x1
20: 08 00 a2 b0  stl  t4,8(t1)    // initialize a.y
20: LITUSE a+0x1
Note that the STL that sets size to 0 has a displacement of 8 and that both
the STQ and STL use the same GOT entry.
When -relax is not specified, objdump -d -r of the executable image shows
the expected code:
0000000120000400 <main>:
120000400: 11 00 bb 27  ldah gp,17(t12)
120000404: c8 81 bd 23  lda gp,-32312(gp)
120000408: 30 80 5d a4  ldq t1,-32720(gp)
12000040c: 00 80 7d a7  ldq t12,-32768(gp)
120000410: 50 80 7d a4  ldq t2,-32688(gp)
120000414: 05 34 e0 47  mov 0x1,t4
120000418: f0 ff de 23  lda sp,-16(sp)
12000041c: 00 00 62 b4  stq t2,0(t1)
120000420: 08 00 a2 b0  stl t4,8(t1)
120000424: 00 00 5e b7  stq ra,0(sp)
120000428: 00 00 fe 2f  unop
12000042c: 08 40 5b 6b  jsr ra,(t12),120000450 <print> 120000430: 11 00 ba
27  ldah gp,17(ra)
120000434: 00 00 5e a7  ldq ra,0(sp)
120000438: 98 81 bd 23  lda gp,-32360(gp)
12000043c: 00 04 ff 47  clr v0
120000440: 10 00 de 23  lda sp,16(sp)
120000444: 01 80 fa 6b  ret zero,(ra),0x1

However, if -relax is specified objdump -d -r shows this code in the
executable image:
0000000120000400 <main>:
120000400: 11 00 bb 27  ldah gp,17(t12)
120000404: c8 81 bd 23  lda gp,-32312(gp)
120000408: 00 00 e0 2f  unop
12000040c: 00 00 e0 2f  unop
120000410: 40 80 7d a4  ldq t2,-32704(gp)
120000414: 05 34 e0 47  mov 0x1,t4
120000418: f0 ff de 23  lda sp,-16(sp)
12000041c: a0 81 7d b4  stq t2,-32352(gp)
120000420: a0 81 bd b0  stl t4,-32352(gp)
120000424: 00 00 5e b7  stq ra,0(sp)
120000428: 00 00 fe 2f  unop
12000042c: 0a 00 40 d3  bsr ra,120000458 <print+0x8> 120000430: 11 00 ba 27
ldah gp,17(ra)
120000434: 00 00 5e a7  ldq ra,0(sp)
120000438: 98 81 bd 23  lda gp,-32360(gp)
12000043c: 00 04 ff 47  clr v0
120000440: 10 00 de 23  lda sp,16(sp)
120000444: 01 80 fa 6b  ret zero,(ra),0x1

ld has recognized that symbol a is within 32K of the GP value and therefore
the LDQ of register t1 can be turned into a NOP and the stores can be done
directly into offsets off of GP.  Unfortunately, though, the computed offset
on the STL has failed to take into account the 8 that was in the
displacement field of the original instruction.  The result is that the STL
stores to the same address as the STQ, thereby overwriting the low 32 bits
of the value stored by the STQ with zeros.  This results in the segmentation
fault at run-time.
This doesn't happen with gcc (at least not on this program) because gcc
loads the base address of a twice, once for each of the assignments to the
members of the struct.  The gcc code for the assignments is one instruction
longer than the ccc code.
Examining the source file elf64-alpha.c shows that the problem is in routine
elf64_alpha_relax_with_lituse.  The code for optimizing a LITUSE on a
memory-format instruction that has 16-bit displacements reads:
case 1: /* MEM FORMAT */
/* We can always optimize 16-bit displacements.  */ if (fits16)
            {
/* FIXME: sanity check the insn for mem format with
zero addend.  */
/* Take the op code and dest from this insn, take the base
register from the literal insn.  Leave the offset alone.  */
insn = (insn & 0xffe00000) | (lit_insn & 0x001f0000);
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;

bfd_put_32 (info->abfd, insn, info->contents + urel->r_offset);
info->changed_contents = true;
            }

Note that this line:
insn = (insn & 0xffe00000) | (lit_insn & 0x001f0000);
has the side effect of zeroing out the low 16 bits (the displacement field)
of insn.  I think we have run afoul of the FIXME comment here- the code for
doing memory-format LITUSE optimizations assumes that all of the LITUSEs
corresponding to a LITERAL have zero displacements.
If this is not the case, there are two problems with the code:
1)	As shown by our bug reproducer, the displacement field of the instruction
with the LITUSE is not preserved by the optimization-it is zeroed.
2)	The booleans fits16 and fits32, which tell whether the 16-bit (and
corresponding 32-bit) optimizations can be performed, are computed once
outside the loop through the LITUSEs.  This is OK if all displacements are
zero but not valid if this is not the case-it's possible for the
displacement to put the distance from GP outside the range permitted for the
optimization.  fits16 and fits32 must take into account the displacement and
they must be calculated for each LITUSE that's examined.

Here is a suggested correction to the code (warning:  I haven't tested it):
-----------------------------------------------------------------------
/* A little preparation for the loop... */
disp = symval - info->gp;
for (urel = irel+1, i = 0; i < count; ++i, ++urel)
    {
unsigned int insn;
int insn_disp;
bfd_signed_vma xdisp;

insn = bfd_get_32 (info->abfd, info->contents + urel->r_offset); switch
(urel->r_addend)
        {
default: /* 0 = ADDRESS FORMAT */
/* This type is really just a placeholder to note that all
uses cannot be optimized, but to still allow some.  */ all_optimized =
false; break;
case 1: /* MEM FORMAT */
/* We can always optimize 16-bit displacements.  */
/* extract the displacement from the instruction,
sign-extending it if necessary, then test whether it
is with 16 or 32 bits of GP */
insn_disp = insn & 0x0000ffff;
if (insn_disp & 0x00008000)
insn_disp |= 0xffff0000; /* negative; sign-extend */
xdisp = disp + insn_disp; fits16 = (xdisp >= -(bfd_signed_vma)0x8000 &&
xdisp < 0x8000); fits32 = (xdisp >= -(bfd_signed_vma)0x80000000 && xdisp <
0x7fff8000); if (fits16)
            {
/* Take the op code and dest from this insn, take the base
register from the literal insn.  Leave the offset alone.  */
insn = (insn & 0xffe0ffff) | (lit_insn & 0x001f0000);
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;

bfd_put_32 (info->abfd, insn, info->contents + urel->r_offset);
info->changed_contents = true;
            }

/* If all mem+byte, we can optimize 32-bit mem displacements.  */ else if
(fits32 && !(flags & ~6))
            {
/* FIXME: sanity check that lit insn Ra is mem insn Rb  */
irel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELHIGH);
lit_insn = (OP_LDAH << 26) | (lit_insn & 0x03ff0000);
bfd_put_32 (info->abfd, lit_insn,
info->contents + irel->r_offset); lit_reused = true; info->changed_contents
= true;
urel->r_info = ELF64_R_INFO (ELF64_R_SYM (irel->r_info),
R_ALPHA_GPRELLOW);
urel->r_addend = irel->r_addend;
info->changed_relocs = true;
            }
else
all_optimized = false; break;
----------------------------------------------------------------------


REPRODUCING THE PROBLEM
======================

Link the supplied object files bug.o and bug2.o using the gcc command:
gcc -o good bug.o bug2.o
gcc -o bad bug.o bug2.o -relax

Executable image good produces the expected results.
Executable image bad gets a segmentation fault.
[End of problem report]



ld_relax.patch

bug.o

bug2.o

Changelog


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]