This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [RFA] rs6000-tdep.c: Improve prologue handling with code motion.


Thank you very much for doing this.

I agree that both breakpoint placements are better with (b) or (c),
and with your examples it is quite obvious that a dumb find_pc_line test
produces worse results with future versions of gcc.

> > On the other hand we have prove that the patch doesn't introduce any
> > testsuite regressions, even with -O2.
> 
> Hmm.  I don't see how this is going to be possible.  I examined
> quite a few of the test cases and there were some where it was
> shear luck that (a) passed some of the tests that (b) and (c) had
> failed upon.

As I said, I was using stock gcc-2.95.2 when running the testsuite.
It seems that gcc-2.95.2 does not yet perform code motion into the prologue,
to whit (with gcc-2.95.2 -O2):

0x100004cc <factorial>: mflr    r0
0x100004d0 <factorial+4>:       stw     r31,-4(r1)
0x100004d4 <factorial+8>:       stw     r0,8(r1)
0x100004d8 <factorial+12>:      stwu    r1,-64(r1)
0x100004dc <factorial+16>:      mr      r31,r3
0x100004e0 <factorial+20>:      cmpwi   r31,1
0x100004e4 <factorial+24>:      ble     0x100004f4 <factorial+40>

0x10000340 <main>:      mflr    r0
0x10000344 <main+4>:    stw     r0,8(r1)
0x10000348 <main+8>:    stwu    r1,-56(r1)
0x1000034c <main+12>:   bl      0x10000590 <__main>
0x10000350 <main+16>:   nop
0x10000354 <main+20>:   lwz     r3,80(r2)
0x10000358 <main+24>:   li      r4,1
0x1000035c <main+28>:   bl      0x1000060c <printf>


I like your refine_prologue_limit approach, but I have one objection, I think
that we never should modify lim_pc if it is non-zero.

In these cases lim_pc is obtained from the current state of the program,
and it is questionable if we really want to modify it.
I'd really like to see an example where modifying a non-zero lim_pc
is necessary, and even then we should think about the consequences for other
cases.

> 
> On Nov 10,  1:26pm, Peter.Schauer wrote:
> 
> > I was also trying to minimize the risk of overscan and performance impacts.
> [...]
> > I don't know if we should include the find_pc_line test or not.
> 
> I have an alternate proposal (that uses find_pc_line) which attempts
> to limit the possibility of overscan.  (See below.)
> 
> > Would you be willing to run the GDB testsuite with your version of the
> > compiler, using -O2 ?
> > It will almost certainly cause many failures, but it would be interesting,
> > if you get better results with my original patch, or still even better
> > results when you leave out the find_pc_line test.
> 
> This was certainly an interesting exercise, but the results are
> inconclusive.  I do get slightly better results when I use your
> patch *with* the find_pc_line test.  Here are the numerical
> results:
> 
> (a) FAILs w/ patch with find_pc_line:        421
> (b) FAILs w/ patch without find_pc_line:     422
> (c) FAILs w/ limit refinement patch (below): 422
> 
> (There's some non-determinism in one of the threads tests; Sometimes
> one additional failure would appear.)
> 
> When I compare the specific FAILures, I find that the FAILures for
> (b) and (c) to be identical.  However, I think (c) performs better
> and I would expect there to be cases where (c) would result in a
> better prologue analysis due to the fact that it will prevent an
> overscan.
> 
> The results are interesting when you compare the specific FAILs in (a)
> and (b).  These tests do have some FAILs in common, but a large number
> of them are different.  I.e, there were a lot of tests which got a
> PASS for (a) and a FAIL for (b) and vice versa.
> 
> I will now describe two of the more interesting cases.
> 
> Case 1: gdb.base/break.exp: run until file:function(6) breakpoint
> 
> This FAILs with (b) and (c), but passes with (a).  The reason for
> (b) and (c) FAILing is that the breakpoint gets set on an instruction
> corresponding to line 97 in break.c instead of line 96 which is
> expected by the test suite.
> 
> The generated code for factorial looks like this:
> 
> 0x100005f8 <factorial>: stwu    r1,-16(r1)
> 0x100005fc <factorial+4>:       mflr    r0
> 0x10000600 <factorial+8>:       stw     r31,12(r1)
> 0x10000604 <factorial+12>:      mr      r31,r3
> 0x10000608 <factorial+16>:      cmpwi   r31,1
> 0x1000060c <factorial+20>:      stw     r0,20(r1)
> 0x10000610 <factorial+24>:      addi    r3,r31,-1
> 0x10000614 <factorial+28>:      ble     0x10000624 <factorial+44>
> 0x10000618 <factorial+32>:      crclr   4*cr1+eq
> 0x1000061c <factorial+36>:      bl      0x100005f8 <factorial>
> 0x10000620 <factorial+40>:      mullw   r31,r31,r3
> 0x10000624 <factorial+44>:      lwz     r0,20(r1)
> 0x10000628 <factorial+48>:      mr      r3,r31
> 0x1000062c <factorial+52>:      lwz     r31,12(r1)
> 0x10000630 <factorial+56>:      mtlr    r0
> 0x10000634 <factorial+60>:      addi    r1,r1,16
> 0x10000638 <factorial+64>:      blr
> 
> (b) and (c) put the breakpoint for "break break.c:factorial" on
> factorial+24.  In my opinion, this is correct since the stw
> instruction at factorial+20 is actually the last instruction in the
> prologue.  Note also that factorial+24 corresponds to line 97, but
> that factorial+28 corresponds to line 96.
> 
> (a) puts the breakpoint on factorial+16.  This instruction corresponds
> to the "if" statement (line 96), but is before the end of the prologue.
> 
> So... even though we have a failure here where we had none before,
> I think the breakpoint placement is preferable.
> 
> Case 2: gdb.base/sizeof.exp: check sizeof char == 1
> 
> In this case, a breakpoint is placed on main.  (b) and (c) PASS this
> test whereas (a) FAILs it.
> 
> The code in question looks like this:
> 
> 0x100004a4 <main>:      lis     r3,4096
> 0x100004a8 <main+4>:    stwu    r1,-16(r1)
> 0x100004ac <main+8>:    mflr    r0
> 0x100004b0 <main+12>:   li      r4,1
> 0x100004b4 <main+16>:   addi    r3,r3,1616
> 0x100004b8 <main+20>:   stw     r0,20(r1)
> 0x100004bc <main+24>:   crclr   4*cr1+eq
> 0x100004c0 <main+28>:   bl      0x100108a0 <printf>
> ...
> 
> Note that the very first instruction of the function *is not*
> part of the prologue!
> 
> That being the case, (a) puts its breakpoint on 0x100004a4 <main>,
> whereas (b) and (c) put the breakpoint on 0x100004bc <main+24> which
> is just after the last prologue instruction.  Even if you disagree
> with me on case 1, this breakpoint is clearly preferable for case 2.
> 
> [...]
> > On the other hand we have prove that the patch doesn't introduce any
> > testsuite regressions, even with -O2.
> 
> Hmm.  I don't see how this is going to be possible.  I examined
> quite a few of the test cases and there were some where it was
> shear luck that (a) passed some of the tests that (b) and (c) had
> failed upon.
> 
> I do agree though that there should be no regressions when we
> take the optimizer out of the picture.  I've tested for such
> and have found none.
> 
> Here's a combined patch:
> 
> 	* rs6000-tdep.c (refine_prolog_limit): New function.
> 	(skip_prologue): Adjust lim_pc by calling refine_prologue_limit.
> 	Also, adjust fencepost error regarding the limit in the loop.
> 
> 	From Peter Schauer:
> 	* rs6000-tdep.c (skip_prologue):  Handle optimizer code motions into
> 	the prologue by continuing the prologue search, if we have no valid
> 	frame yet or if the return address is not yet saved in the frame.
> 
> Index: rs6000-tdep.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/rs6000-tdep.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 rs6000-tdep.c
> --- rs6000-tdep.c	2000/11/09 09:49:00	1.16
> +++ rs6000-tdep.c	2000/11/14 02:32:59
> @@ -380,11 +380,63 @@ rs6000_software_single_step (unsigned in
>  
>  #define GET_SRC_REG(x) (((x) >> 21) & 0x1f)
>  
> +/* Limit the number of skipped non-prologue instructions, as the examining
> +   of the prologue is expensive.  */
> +static int max_skip_non_prologue_insns = 10;
> +
> +/* Given PC representing the starting address of a function, and
> +   LIM_PC which is the (sloppy) limit to which to scan when looking
> +   for a prologue, attempt to further refine this limit by using
> +   the line data in the symbol table.  If successful, a better guess
> +   on where the prologue ends is returned, otherwise the previous
> +   value of lim_pc is returned.  */
>  static CORE_ADDR
> +refine_prologue_limit (CORE_ADDR pc, CORE_ADDR lim_pc)
> +{
> +  struct symtab_and_line prologue_sal;
> +
> +  prologue_sal = find_pc_line (pc, 0);
> +  if (prologue_sal.line != 0)
> +    {
> +      int i;
> +      CORE_ADDR addr = prologue_sal.end;
> +
> +      /* Handle the case in which compiler's optimizer/scheduler
> +         has moved instructions into the prologue.  We scan ahead
> +	 in the function looking for address ranges whose corresponding
> +	 line number is less than or equal to the first one that we
> +	 found for the function.  (It can be less than when the
> +	 scheduler puts a body instruction before the first prologue
> +	 instruction.)  */
> +      for (i = 2 * max_skip_non_prologue_insns; 
> +           i > 0 && (lim_pc == 0 || addr < lim_pc);
> +	   i--)
> +        {
> +	  struct symtab_and_line sal;
> +
> +	  sal = find_pc_line (addr, 0);
> +	  if (sal.line == 0)
> +	    break;
> +	  if (sal.line <= prologue_sal.line 
> +	      && sal.symtab == prologue_sal.symtab)
> +	    {
> +	      prologue_sal = sal;
> +	    }
> +	  addr = sal.end;
> +	}
> +
> +      if (lim_pc == 0 || prologue_sal.end < lim_pc)
> +	lim_pc = prologue_sal.end;
> +    }
> +  return lim_pc;
> +}
> +
> +
> +static CORE_ADDR
>  skip_prologue (CORE_ADDR pc, CORE_ADDR lim_pc, struct rs6000_framedata *fdata)
>  {
>    CORE_ADDR orig_pc = pc;
> -  CORE_ADDR last_prologue_pc;
> +  CORE_ADDR last_prologue_pc = pc;
>    char buf[4];
>    unsigned long op;
>    long offset = 0;
> @@ -394,6 +446,9 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
>    int framep = 0;
>    int minimal_toc_loaded = 0;
>    int prev_insn_was_prologue_insn = 1;
> +  int num_skip_non_prologue_insns = 0;
> +
> +  lim_pc = refine_prologue_limit (pc, lim_pc);
>  
>    memset (fdata, 0, sizeof (struct rs6000_framedata));
>    fdata->saved_gpr = -1;
> @@ -402,19 +457,22 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
>    fdata->frameless = 1;
>    fdata->nosavedpc = 1;
>  
> -  pc -= 4;
> -  while (lim_pc == 0 || pc < lim_pc - 4)
> +  for (;; pc += 4)
>      {
> -      pc += 4;
> -
>        /* Sometimes it isn't clear if an instruction is a prologue
>           instruction or not.  When we encounter one of these ambiguous
>  	 cases, we'll set prev_insn_was_prologue_insn to 0 (false).
>  	 Otherwise, we'll assume that it really is a prologue instruction. */
>        if (prev_insn_was_prologue_insn)
>  	last_prologue_pc = pc;
> +
> +      /* Stop scanning if we've hit the limit.  */
> +      if (lim_pc != 0 && pc >= lim_pc)
> +	break;
> +
>        prev_insn_was_prologue_insn = 1;
>  
> +      /* Fetch the instruction and convert it to an integer.  */
>        if (target_read_memory (pc, buf, 4))
>  	break;
>        op = extract_signed_integer (buf, 4);
> @@ -628,7 +686,31 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
>  	}
>        else
>  	{
> -	  break;
> +	  /* Not a recognized prologue instruction.
> +	     Handle optimizer code motions into the prologue by continuing
> +	     the search if we have no valid frame yet or if the return
> +	     address is not yet saved in the frame.  */
> +	  if (fdata->frameless == 0
> +	      && (lr_reg == -1 || fdata->nosavedpc == 0))
> +	    break;
> +
> +	  if (op == 0x4e800020		/* blr */
> +	      || op == 0x4e800420)	/* bctr */
> +	    /* Do not scan past epilogue in frameless functions or
> +	       trampolines.  */
> +	    break;
> +	  if ((op & 0xf4000000) == 0x40000000) /* bxx */
> +	    /* Never skip branches. */
> +	    break;
> +
> +	  if (num_skip_non_prologue_insns++ > max_skip_non_prologue_insns)
> +	    /* Do not scan too many insns, scanning insns is expensive with
> +	       remote targets.  */
> +	    break;
> +
> +	  /* Continue scanning.  */
> +	  prev_insn_was_prologue_insn = 0;
> +	  continue;
>  	}
>      }
>  
> 
> 
> 


-- 
Peter Schauer			pes@regent.e-technik.tu-muenchen.de

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]