This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: [RFA] rs6000-tdep.c: Improve prologue handling with code motion.
- To: kevinb at cygnus dot com (Kevin Buettner)
- Subject: Re: [RFA] rs6000-tdep.c: Improve prologue handling with code motion.
- From: "Peter.Schauer" <Peter dot Schauer at regent dot e-technik dot tu-muenchen dot de>
- Date: Tue, 14 Nov 2000 19:12:21 MET
- Cc: gdb-patches at sourceware dot cygnus dot com
Thank you very much for doing this.
I agree that both breakpoint placements are better with (b) or (c),
and with your examples it is quite obvious that a dumb find_pc_line test
produces worse results with future versions of gcc.
> > On the other hand we have prove that the patch doesn't introduce any
> > testsuite regressions, even with -O2.
>
> Hmm. I don't see how this is going to be possible. I examined
> quite a few of the test cases and there were some where it was
> shear luck that (a) passed some of the tests that (b) and (c) had
> failed upon.
As I said, I was using stock gcc-2.95.2 when running the testsuite.
It seems that gcc-2.95.2 does not yet perform code motion into the prologue,
to whit (with gcc-2.95.2 -O2):
0x100004cc <factorial>: mflr r0
0x100004d0 <factorial+4>: stw r31,-4(r1)
0x100004d4 <factorial+8>: stw r0,8(r1)
0x100004d8 <factorial+12>: stwu r1,-64(r1)
0x100004dc <factorial+16>: mr r31,r3
0x100004e0 <factorial+20>: cmpwi r31,1
0x100004e4 <factorial+24>: ble 0x100004f4 <factorial+40>
0x10000340 <main>: mflr r0
0x10000344 <main+4>: stw r0,8(r1)
0x10000348 <main+8>: stwu r1,-56(r1)
0x1000034c <main+12>: bl 0x10000590 <__main>
0x10000350 <main+16>: nop
0x10000354 <main+20>: lwz r3,80(r2)
0x10000358 <main+24>: li r4,1
0x1000035c <main+28>: bl 0x1000060c <printf>
I like your refine_prologue_limit approach, but I have one objection, I think
that we never should modify lim_pc if it is non-zero.
In these cases lim_pc is obtained from the current state of the program,
and it is questionable if we really want to modify it.
I'd really like to see an example where modifying a non-zero lim_pc
is necessary, and even then we should think about the consequences for other
cases.
>
> On Nov 10, 1:26pm, Peter.Schauer wrote:
>
> > I was also trying to minimize the risk of overscan and performance impacts.
> [...]
> > I don't know if we should include the find_pc_line test or not.
>
> I have an alternate proposal (that uses find_pc_line) which attempts
> to limit the possibility of overscan. (See below.)
>
> > Would you be willing to run the GDB testsuite with your version of the
> > compiler, using -O2 ?
> > It will almost certainly cause many failures, but it would be interesting,
> > if you get better results with my original patch, or still even better
> > results when you leave out the find_pc_line test.
>
> This was certainly an interesting exercise, but the results are
> inconclusive. I do get slightly better results when I use your
> patch *with* the find_pc_line test. Here are the numerical
> results:
>
> (a) FAILs w/ patch with find_pc_line: 421
> (b) FAILs w/ patch without find_pc_line: 422
> (c) FAILs w/ limit refinement patch (below): 422
>
> (There's some non-determinism in one of the threads tests; Sometimes
> one additional failure would appear.)
>
> When I compare the specific FAILures, I find that the FAILures for
> (b) and (c) to be identical. However, I think (c) performs better
> and I would expect there to be cases where (c) would result in a
> better prologue analysis due to the fact that it will prevent an
> overscan.
>
> The results are interesting when you compare the specific FAILs in (a)
> and (b). These tests do have some FAILs in common, but a large number
> of them are different. I.e, there were a lot of tests which got a
> PASS for (a) and a FAIL for (b) and vice versa.
>
> I will now describe two of the more interesting cases.
>
> Case 1: gdb.base/break.exp: run until file:function(6) breakpoint
>
> This FAILs with (b) and (c), but passes with (a). The reason for
> (b) and (c) FAILing is that the breakpoint gets set on an instruction
> corresponding to line 97 in break.c instead of line 96 which is
> expected by the test suite.
>
> The generated code for factorial looks like this:
>
> 0x100005f8 <factorial>: stwu r1,-16(r1)
> 0x100005fc <factorial+4>: mflr r0
> 0x10000600 <factorial+8>: stw r31,12(r1)
> 0x10000604 <factorial+12>: mr r31,r3
> 0x10000608 <factorial+16>: cmpwi r31,1
> 0x1000060c <factorial+20>: stw r0,20(r1)
> 0x10000610 <factorial+24>: addi r3,r31,-1
> 0x10000614 <factorial+28>: ble 0x10000624 <factorial+44>
> 0x10000618 <factorial+32>: crclr 4*cr1+eq
> 0x1000061c <factorial+36>: bl 0x100005f8 <factorial>
> 0x10000620 <factorial+40>: mullw r31,r31,r3
> 0x10000624 <factorial+44>: lwz r0,20(r1)
> 0x10000628 <factorial+48>: mr r3,r31
> 0x1000062c <factorial+52>: lwz r31,12(r1)
> 0x10000630 <factorial+56>: mtlr r0
> 0x10000634 <factorial+60>: addi r1,r1,16
> 0x10000638 <factorial+64>: blr
>
> (b) and (c) put the breakpoint for "break break.c:factorial" on
> factorial+24. In my opinion, this is correct since the stw
> instruction at factorial+20 is actually the last instruction in the
> prologue. Note also that factorial+24 corresponds to line 97, but
> that factorial+28 corresponds to line 96.
>
> (a) puts the breakpoint on factorial+16. This instruction corresponds
> to the "if" statement (line 96), but is before the end of the prologue.
>
> So... even though we have a failure here where we had none before,
> I think the breakpoint placement is preferable.
>
> Case 2: gdb.base/sizeof.exp: check sizeof char == 1
>
> In this case, a breakpoint is placed on main. (b) and (c) PASS this
> test whereas (a) FAILs it.
>
> The code in question looks like this:
>
> 0x100004a4 <main>: lis r3,4096
> 0x100004a8 <main+4>: stwu r1,-16(r1)
> 0x100004ac <main+8>: mflr r0
> 0x100004b0 <main+12>: li r4,1
> 0x100004b4 <main+16>: addi r3,r3,1616
> 0x100004b8 <main+20>: stw r0,20(r1)
> 0x100004bc <main+24>: crclr 4*cr1+eq
> 0x100004c0 <main+28>: bl 0x100108a0 <printf>
> ...
>
> Note that the very first instruction of the function *is not*
> part of the prologue!
>
> That being the case, (a) puts its breakpoint on 0x100004a4 <main>,
> whereas (b) and (c) put the breakpoint on 0x100004bc <main+24> which
> is just after the last prologue instruction. Even if you disagree
> with me on case 1, this breakpoint is clearly preferable for case 2.
>
> [...]
> > On the other hand we have prove that the patch doesn't introduce any
> > testsuite regressions, even with -O2.
>
> Hmm. I don't see how this is going to be possible. I examined
> quite a few of the test cases and there were some where it was
> shear luck that (a) passed some of the tests that (b) and (c) had
> failed upon.
>
> I do agree though that there should be no regressions when we
> take the optimizer out of the picture. I've tested for such
> and have found none.
>
> Here's a combined patch:
>
> * rs6000-tdep.c (refine_prolog_limit): New function.
> (skip_prologue): Adjust lim_pc by calling refine_prologue_limit.
> Also, adjust fencepost error regarding the limit in the loop.
>
> From Peter Schauer:
> * rs6000-tdep.c (skip_prologue): Handle optimizer code motions into
> the prologue by continuing the prologue search, if we have no valid
> frame yet or if the return address is not yet saved in the frame.
>
> Index: rs6000-tdep.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/rs6000-tdep.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 rs6000-tdep.c
> --- rs6000-tdep.c 2000/11/09 09:49:00 1.16
> +++ rs6000-tdep.c 2000/11/14 02:32:59
> @@ -380,11 +380,63 @@ rs6000_software_single_step (unsigned in
>
> #define GET_SRC_REG(x) (((x) >> 21) & 0x1f)
>
> +/* Limit the number of skipped non-prologue instructions, as the examining
> + of the prologue is expensive. */
> +static int max_skip_non_prologue_insns = 10;
> +
> +/* Given PC representing the starting address of a function, and
> + LIM_PC which is the (sloppy) limit to which to scan when looking
> + for a prologue, attempt to further refine this limit by using
> + the line data in the symbol table. If successful, a better guess
> + on where the prologue ends is returned, otherwise the previous
> + value of lim_pc is returned. */
> static CORE_ADDR
> +refine_prologue_limit (CORE_ADDR pc, CORE_ADDR lim_pc)
> +{
> + struct symtab_and_line prologue_sal;
> +
> + prologue_sal = find_pc_line (pc, 0);
> + if (prologue_sal.line != 0)
> + {
> + int i;
> + CORE_ADDR addr = prologue_sal.end;
> +
> + /* Handle the case in which compiler's optimizer/scheduler
> + has moved instructions into the prologue. We scan ahead
> + in the function looking for address ranges whose corresponding
> + line number is less than or equal to the first one that we
> + found for the function. (It can be less than when the
> + scheduler puts a body instruction before the first prologue
> + instruction.) */
> + for (i = 2 * max_skip_non_prologue_insns;
> + i > 0 && (lim_pc == 0 || addr < lim_pc);
> + i--)
> + {
> + struct symtab_and_line sal;
> +
> + sal = find_pc_line (addr, 0);
> + if (sal.line == 0)
> + break;
> + if (sal.line <= prologue_sal.line
> + && sal.symtab == prologue_sal.symtab)
> + {
> + prologue_sal = sal;
> + }
> + addr = sal.end;
> + }
> +
> + if (lim_pc == 0 || prologue_sal.end < lim_pc)
> + lim_pc = prologue_sal.end;
> + }
> + return lim_pc;
> +}
> +
> +
> +static CORE_ADDR
> skip_prologue (CORE_ADDR pc, CORE_ADDR lim_pc, struct rs6000_framedata *fdata)
> {
> CORE_ADDR orig_pc = pc;
> - CORE_ADDR last_prologue_pc;
> + CORE_ADDR last_prologue_pc = pc;
> char buf[4];
> unsigned long op;
> long offset = 0;
> @@ -394,6 +446,9 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
> int framep = 0;
> int minimal_toc_loaded = 0;
> int prev_insn_was_prologue_insn = 1;
> + int num_skip_non_prologue_insns = 0;
> +
> + lim_pc = refine_prologue_limit (pc, lim_pc);
>
> memset (fdata, 0, sizeof (struct rs6000_framedata));
> fdata->saved_gpr = -1;
> @@ -402,19 +457,22 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
> fdata->frameless = 1;
> fdata->nosavedpc = 1;
>
> - pc -= 4;
> - while (lim_pc == 0 || pc < lim_pc - 4)
> + for (;; pc += 4)
> {
> - pc += 4;
> -
> /* Sometimes it isn't clear if an instruction is a prologue
> instruction or not. When we encounter one of these ambiguous
> cases, we'll set prev_insn_was_prologue_insn to 0 (false).
> Otherwise, we'll assume that it really is a prologue instruction. */
> if (prev_insn_was_prologue_insn)
> last_prologue_pc = pc;
> +
> + /* Stop scanning if we've hit the limit. */
> + if (lim_pc != 0 && pc >= lim_pc)
> + break;
> +
> prev_insn_was_prologue_insn = 1;
>
> + /* Fetch the instruction and convert it to an integer. */
> if (target_read_memory (pc, buf, 4))
> break;
> op = extract_signed_integer (buf, 4);
> @@ -628,7 +686,31 @@ skip_prologue (CORE_ADDR pc, CORE_ADDR l
> }
> else
> {
> - break;
> + /* Not a recognized prologue instruction.
> + Handle optimizer code motions into the prologue by continuing
> + the search if we have no valid frame yet or if the return
> + address is not yet saved in the frame. */
> + if (fdata->frameless == 0
> + && (lr_reg == -1 || fdata->nosavedpc == 0))
> + break;
> +
> + if (op == 0x4e800020 /* blr */
> + || op == 0x4e800420) /* bctr */
> + /* Do not scan past epilogue in frameless functions or
> + trampolines. */
> + break;
> + if ((op & 0xf4000000) == 0x40000000) /* bxx */
> + /* Never skip branches. */
> + break;
> +
> + if (num_skip_non_prologue_insns++ > max_skip_non_prologue_insns)
> + /* Do not scan too many insns, scanning insns is expensive with
> + remote targets. */
> + break;
> +
> + /* Continue scanning. */
> + prev_insn_was_prologue_insn = 0;
> + continue;
> }
> }
>
>
>
>
--
Peter Schauer pes@regent.e-technik.tu-muenchen.de