This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Displaced stepping (non-stop debugging) support for ARM Linux


Here's a new version of the ARM displaced-stepping patch, together with
a new version of the patch to always use displaced stepping if it is
enabled:

Pedro wrote:
> It would be nice to have that fixed, for sure, so yes to the
> we should fix that question.  However, it seems to me that this
> is something that can be worked on mostly independently of the ARM
> bits as it's a general software single-step issue, not really ARM
> specific.  Unless someone wants to (and has time to) tackle it
> right now, I'd say go with the always displace-step version.  If
> nothing else, helps in stressing the displaced stepping
> implementation.  :-)

As suggested here.

Dan wrote:
> Pedro wrote:
> > Care must be taken to keep  
> 
> Thanks for the plan.  I suspect this is too much to insist on before
> this patch goes in :-)

The current patch still uses a target round trip with a NOP
instruction, rather than fiddling with infrun.c to handle
fully-emulated instructions more cleanly (and/or faster). Something for
future improvement, perhaps.

Dan wrote:
> [a Linux signal handling explanation]

Thanks for that -- I think signal handling for displaced stepping now
works reasonably well, including stepping over sigreturn/rt_sigreturn
syscalls (for EABI). AFAICT the scratch space address never leaks into
the signal trampoline frame, so the potentially-disastrous results of
that happening are avoided already.

One possibly dubious part though is the positioning of the
insert_breakpoints() call in arm-linux-tdep.c:arm_linux_copy_svc():
without that, the momentary breakpoint used to regain control after a
sigreturn syscall never actually gets inserted into the debugged
program, because the displaced-step copy function gets called after
that normally happens. It should be safe AFAICT, but I may have
overlooked something.

Other things mentioned during previous review are fixed, hopefully.

Test results look reasonable, I think. "mi-nonstop.exp" tests fail in
Thumb mode, since this patch doesn't support Thumb. There's some noise
in threading results, but that's probably just bad luck.

OK to apply?

Cheers,

Julian

ChangeLog (displaced-stepping-always)

    * infrun.c (displaced_step_fixup): If this is a software
    single-stepping arch, don't tell the target to single-step.
    (maybe_software_singlestep): Return 0 if we're using displaced
    stepping.
    (resume): If this is a software single-stepping arch, and
    displaced-stepping is enabled, use it for all single-step
    requests.

ChangeLog (displaced-stepping)

    gdb/
    * arm-linux-tdep.c (arch-utils.h, inferior.h, gdbthread.h, symfile.h): Include files.
    (arm_linux_cleanup_svc, arm_linux_copy_svc): New.
    (cleanup_kernel_helper_return, arm_catch_kernel_helper_return): New.
    (arm_linux_displaced_step_copy_insn): New.
    (arm_linux_init_abi): Initialise displaced stepping callbacks.
    * arm-tdep.c (DISPLACED_STEPPING_ARCH_VERSION): New macro.
    (ARM_NOP): New.
    (displaced_read_reg, displaced_in_arm_mode, branch_write_pc)
    (bx_write_pc, load_write_pc, alu_write_pc, displaced_write_reg)
    (insn_references_pc, copy_unmodified, cleanup_preload, copy_preload)
    (copy_preload_reg, cleanup_copro_load_store, copy_copro_load_store)
    (cleanup_branch, copy_b_bl_blx, copy_bx_blx_reg, cleanup_alu_imm)
    (copy_alu_imm, cleanup_alu_reg, copy_alu_reg)
    (cleanup_alu_shifted_reg, copy_alu_shifted_reg, cleanup_load)
    (cleanup_store, copy_extra_ld_st, copy_ldr_str_ldrb_strb)
    (cleanup_block_load_all, cleanup_block_store_pc)
    (cleanup_block_load_pc, copy_block_xfer, cleanup_svc, copy_svc)
    (copy_undef, copy_unpred): New.
    (decode_misc_memhint_neon, decode_unconditional)
    (decode_miscellaneous, decode_dp_misc, decode_ld_st_word_ubyte)
    (decode_media, decode_b_bl_ldmstm, decode_ext_reg_ld_st)
    (decode_svc_copro, arm_process_displaced_insn)
    (arm_displaced_init_closure, arm_displaced_step_copy_insn)
    (arm_displaced_step_fixup): New.
    (arm_gdbarch_init): Initialise max insn length field.
    * arm-tdep.h (DISPLACED_TEMPS, DISPLACED_MODIFIED_INSNS): New
    macros.
    (displaced_step_closure, pc_write_style): New.
    (arm_displaced_init_closure, displaced_read_reg)
    (arm_process_displaced_insn, arm_displaced_init_closure, displaced_read_reg)
    (displaced_write_reg, arm_displaced_step_copy_insn, arm_displaced_step_fixup): Add
    prototypes.
--- .pc/displaced-stepping/gdb/arm-linux-tdep.c	2009-07-15 11:14:33.000000000 -0700
+++ gdb/arm-linux-tdep.c	2009-07-15 11:15:02.000000000 -0700
@@ -38,6 +38,10 @@
 #include "arm-linux-tdep.h"
 #include "linux-tdep.h"
 #include "glibc-tdep.h"
+#include "arch-utils.h"
+#include "inferior.h"
+#include "gdbthread.h"
+#include "symfile.h"
 
 #include "gdb_string.h"
 
@@ -590,6 +594,205 @@ arm_linux_software_single_step (struct f
   return 1;
 }
 
+/* Support for displaced stepping of Linux SVC instructions.  */
+
+static void
+arm_linux_cleanup_svc (struct regcache *regs,
+		       struct displaced_step_closure *dsc)
+{
+  CORE_ADDR from = dsc->insn_addr;
+  ULONGEST apparent_pc;
+  int within_scratch;
+
+  regcache_cooked_read_unsigned (regs, ARM_PC_REGNUM, &apparent_pc);
+
+  within_scratch = (apparent_pc >= dsc->scratch_base
+		    && apparent_pc < (dsc->scratch_base
+				      + DISPLACED_MODIFIED_INSNS * 4 + 4));
+
+  if (debug_displaced)
+    {
+      fprintf_unfiltered (gdb_stdlog, "displaced: PC is apparently %.8lx after "
+			  "SVC step ", (unsigned long) apparent_pc);
+      if (within_scratch)
+        fprintf_unfiltered (gdb_stdlog, "(within scratch space)\n");
+      else
+        fprintf_unfiltered (gdb_stdlog, "(outside scratch space)\n");
+    }
+
+  if (within_scratch)
+    displaced_write_reg (regs, dsc, ARM_PC_REGNUM, from + 4, BRANCH_WRITE_PC);
+}
+
+static int
+arm_linux_copy_svc (uint32_t insn, CORE_ADDR to, struct regcache *regs,
+		    struct displaced_step_closure *dsc)
+{
+  CORE_ADDR from = dsc->insn_addr;
+  struct frame_info *frame;
+  unsigned int svc_number = displaced_read_reg (regs, from, 7);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying Linux svc insn %.8lx\n",
+			(unsigned long) insn);
+
+  frame = get_current_frame ();
+
+  /* Is this a sigreturn or rt_sigreturn syscall?  Note: these are only useful
+     for EABI.  */
+  if (svc_number == 119 || svc_number == 173)
+    {
+      if (get_frame_type (frame) == SIGTRAMP_FRAME)
+	{
+	  CORE_ADDR return_to;
+	  struct symtab_and_line sal;
+
+	  if (debug_displaced)
+	    fprintf_unfiltered (gdb_stdlog, "displaced: found "
+	      "sigreturn/rt_sigreturn SVC call. PC in frame = %lx\n",
+	      (unsigned long) get_frame_pc (frame));
+
+	  return_to = frame_pc_unwind (frame);
+	  if (debug_displaced)
+	    fprintf_unfiltered (gdb_stdlog, "displaced: unwind pc = %lx. "
+	      "Setting momentary breakpoint.\n", (unsigned long) return_to);
+
+	  gdb_assert (inferior_thread ()->step_resume_breakpoint == NULL);
+
+	  sal = find_pc_line (return_to, 0);
+	  sal.pc = return_to;
+	  sal.section = find_pc_overlay (return_to);
+	  sal.explicit_pc = 1;
+
+	  frame = get_prev_frame (frame);
+
+	  if (frame)
+	    {
+	      inferior_thread ()->step_resume_breakpoint
+        	= set_momentary_breakpoint (sal, get_frame_id (frame),
+					    bp_step_resume);
+
+	      /* We need to make sure we actually insert the momentary
+	         breakpoint set above.  */
+	      insert_breakpoints ();
+	    }
+	  else if (debug_displaced)
+	    fprintf_unfiltered (gdb_stderr, "displaced: couldn't find previous "
+				"frame to set momentary breakpoint for "
+				"sigreturn/rt_sigreturn\n");
+	}
+      else if (debug_displaced)
+	fprintf_unfiltered (gdb_stdlog, "displaced: sigreturn/rt_sigreturn "
+			    "SVC call not in signal trampoline frame\n");
+    }
+
+  /* Preparation: If we detect sigreturn, set momentary breakpoint at resume
+		  location, else nothing.
+     Insn: unmodified svc.
+     Cleanup: if pc lands in scratch space, pc <- insn_addr + 4
+              else leave pc alone.  */
+
+  dsc->modinsn[0] = insn;
+
+  dsc->cleanup = &arm_linux_cleanup_svc;
+  /* Pretend we wrote to the PC, so cleanup doesn't set PC to the next
+     instruction.  */
+  dsc->wrote_to_pc = 1;
+
+  return 0;
+}
+
+
+/* The following two functions implement single-stepping over calls to Linux
+   kernel helper routines, which perform e.g. atomic operations on architecture
+   variants which don't support them natively.
+
+   When this function is called, the PC will be pointing at the kernel helper
+   (at an address inaccessible to GDB), and r14 will point to the return
+   address.  Displaced stepping always executes code in the copy area:
+   so, make the copy-area instruction branch back to the kernel helper (the
+   "from" address), and make r14 point to the breakpoint in the copy area.  In
+   that way, we regain control once the kernel helper returns, and can clean
+   up appropriately (as if we had just returned from the kernel helper as it
+   would have been called from the non-displaced location).  */
+
+static void
+cleanup_kernel_helper_return (struct regcache *regs,
+			      struct displaced_step_closure *dsc)
+{
+  displaced_write_reg (regs, dsc, ARM_LR_REGNUM, dsc->tmp[0], CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, ARM_PC_REGNUM, dsc->tmp[0], BRANCH_WRITE_PC);
+}
+
+static void
+arm_catch_kernel_helper_return (CORE_ADDR from, CORE_ADDR to,
+				struct regcache *regs,
+				struct displaced_step_closure *dsc)
+{
+  dsc->numinsns = 1;
+  dsc->insn_addr = from;
+  dsc->cleanup = &cleanup_kernel_helper_return;
+  /* Say we wrote to the PC, else cleanup will set PC to the next
+     instruction in the helper, which isn't helpful.  */
+  dsc->wrote_to_pc = 1;
+
+  /* Preparation: tmp[0] <- r14
+                  r14 <- <scratch space>+4
+		  *(<scratch space>+8) <- from
+     Insn: ldr pc, [r14, #4]
+     Cleanup: r14 <- tmp[0], pc <- tmp[0].  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, ARM_LR_REGNUM);
+  displaced_write_reg (regs, dsc, ARM_LR_REGNUM, (ULONGEST) to + 4,
+		       CANNOT_WRITE_PC);
+  write_memory_unsigned_integer (to + 8, 4, from);
+
+  dsc->modinsn[0] = 0xe59ef004;  /* ldr pc, [lr, #4].  */
+}
+
+/* Linux-specific displaced step instruction copying function.  Detects when
+   the program has stepped into a Linux kernel helper routine (which must be
+   handled as a special case), falling back to arm_displaced_step_copy_insn()
+   if it hasn't.  */
+
+static struct displaced_step_closure *
+arm_linux_displaced_step_copy_insn (struct gdbarch *gdbarch,
+				    CORE_ADDR from, CORE_ADDR to,
+				    struct regcache *regs)
+{
+  struct displaced_step_closure *dsc
+    = xmalloc (sizeof (struct displaced_step_closure));
+
+  /* Detect when we enter an (inaccessible by GDB) Linux kernel helper, and
+     stop at the return location.  */
+  if (from > 0xffff0000)
+    {
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: detected kernel helper "
+			    "at %.8lx\n", (unsigned long) from);
+
+      arm_catch_kernel_helper_return (from, to, regs, dsc);
+    }
+  else
+    {
+      uint32_t insn = read_memory_unsigned_integer (from, 4);
+
+      if (debug_displaced)
+	fprintf_unfiltered (gdb_stdlog, "displaced: stepping insn %.8lx "
+			    "at %.8lx\n", (unsigned long) insn,
+			    (unsigned long) from);
+
+      /* Override the default handling of SVC instructions.  */
+      dsc->u.svc.copy_svc_os = arm_linux_copy_svc;
+
+      arm_process_displaced_insn (insn, from, to, regs, dsc);
+    }
+
+  arm_displaced_init_closure (gdbarch, from, to, dsc);
+
+  return dsc;
+}
+
 static void
 arm_linux_init_abi (struct gdbarch_info info,
 		    struct gdbarch *gdbarch)
@@ -650,6 +853,14 @@ arm_linux_init_abi (struct gdbarch_info 
 					arm_linux_regset_from_core_section);
 
   set_gdbarch_get_siginfo_type (gdbarch, linux_get_siginfo_type);
+
+  /* Displaced stepping.  */
+  set_gdbarch_displaced_step_copy_insn (gdbarch,
+					arm_linux_displaced_step_copy_insn);
+  set_gdbarch_displaced_step_fixup (gdbarch, arm_displaced_step_fixup);
+  set_gdbarch_displaced_step_free_closure (gdbarch,
+					   simple_displaced_step_free_closure);
+  set_gdbarch_displaced_step_location (gdbarch, displaced_step_at_entry_point);
 }
 
 /* Provide a prototype to silence -Wmissing-prototypes.  */
--- .pc/displaced-stepping/gdb/arm-tdep.c	2009-07-15 11:14:33.000000000 -0700
+++ gdb/arm-tdep.c	2009-07-15 11:15:02.000000000 -0700
@@ -241,6 +241,11 @@ struct arm_prologue_cache
   struct trad_frame_saved_reg *saved_regs;
 };
 
+/* Architecture version for displaced stepping.  This effects the behaviour of
+   certain instructions, and really should not be hard-wired.  */
+
+#define DISPLACED_STEPPING_ARCH_VERSION		5
+
 /* Addresses for calling Thumb functions have the bit 0 set.
    Here are some macros to test, set, or clear bit 0 of addresses.  */
 #define IS_THUMB_ADDR(addr)	((addr) & 1)
@@ -2175,280 +2180,2099 @@ arm_software_single_step (struct frame_i
   return 1;
 }
 
-#include "bfd-in2.h"
-#include "libcoff.h"
+/* ARM displaced stepping support.
+
+   Generally ARM displaced stepping works as follows:
+
+   1. When an instruction is to be single-stepped, it is first decoded by
+      arm_process_displaced_insn (called from arm_displaced_step_copy_insn).
+      Depending on the type of instruction, it is then copied to a scratch
+      location, possibly in a modified form.  The copy_* set of functions
+      performs such modification, as necessary. A breakpoint is placed after
+      the modified instruction in the scratch space to return control to GDB.
+      Note in particular that instructions which modify the PC will no longer
+      do so after modification.
+
+   2. The instruction is single-stepped, by setting the PC to the scratch
+      location address, and resuming.  Control returns to GDB when the
+      breakpoint is hit.
+
+   3. A cleanup function (cleanup_*) is called corresponding to the copy_*
+      function used for the current instruction.  This function's job is to
+      put the CPU/memory state back to what it would have been if the
+      instruction had been executed unmodified in its original location.  */
+
+/* NOP instruction (mov r0, r0).  */
+#define ARM_NOP				0xe1a00000
+
+/* Helper for register reads for displaced stepping.  In particular, this
+   returns the PC as it would be seen by the instruction at its original
+   location.  */
+
+ULONGEST
+displaced_read_reg (struct regcache *regs, CORE_ADDR from, int regno)
+{
+  ULONGEST ret;
+
+  if (regno == 15)
+    {
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: read pc value %.8lx\n",
+			    (unsigned long) from + 8);
+      return (ULONGEST) from + 8;  /* Pipeline offset.  */
+    }
+  else
+    {
+      regcache_cooked_read_unsigned (regs, regno, &ret);
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: read r%d value %.8lx\n",
+			    regno, (unsigned long) ret);
+      return ret;
+    }
+}
 
 static int
-gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
+displaced_in_arm_mode (struct regcache *regs)
 {
-  if (arm_pc_is_thumb (memaddr))
-    {
-      static asymbol *asym;
-      static combined_entry_type ce;
-      static struct coff_symbol_struct csym;
-      static struct bfd fake_bfd;
-      static bfd_target fake_target;
+  ULONGEST ps;
 
-      if (csym.native == NULL)
-	{
-	  /* Create a fake symbol vector containing a Thumb symbol.
-	     This is solely so that the code in print_insn_little_arm() 
-	     and print_insn_big_arm() in opcodes/arm-dis.c will detect
-	     the presence of a Thumb symbol and switch to decoding
-	     Thumb instructions.  */
+  regcache_cooked_read_unsigned (regs, ARM_PS_REGNUM, &ps);
 
-	  fake_target.flavour = bfd_target_coff_flavour;
-	  fake_bfd.xvec = &fake_target;
-	  ce.u.syment.n_sclass = C_THUMBEXTFUNC;
-	  csym.native = &ce;
-	  csym.symbol.the_bfd = &fake_bfd;
-	  csym.symbol.name = "fake";
-	  asym = (asymbol *) & csym;
-	}
+  return (ps & CPSR_T) == 0;
+}
 
-      memaddr = UNMAKE_THUMB_ADDR (memaddr);
-      info->symbols = &asym;
-    }
-  else
-    info->symbols = NULL;
+/* Write to the PC as from a branch instruction.  */
 
-  if (info->endian == BFD_ENDIAN_BIG)
-    return print_insn_big_arm (memaddr, info);
+static void
+branch_write_pc (struct regcache *regs, ULONGEST val)
+{
+  if (displaced_in_arm_mode (regs))
+    /* Note: If bits 0/1 are set, this branch would be unpredictable for
+       architecture versions < 6.  */
+    regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, val & ~(ULONGEST) 0x3);
   else
-    return print_insn_little_arm (memaddr, info);
+    regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, val & ~(ULONGEST) 0x1);
 }
 
-/* The following define instruction sequences that will cause ARM
-   cpu's to take an undefined instruction trap.  These are used to
-   signal a breakpoint to GDB.
-   
-   The newer ARMv4T cpu's are capable of operating in ARM or Thumb
-   modes.  A different instruction is required for each mode.  The ARM
-   cpu's can also be big or little endian.  Thus four different
-   instructions are needed to support all cases.
-   
-   Note: ARMv4 defines several new instructions that will take the
-   undefined instruction trap.  ARM7TDMI is nominally ARMv4T, but does
-   not in fact add the new instructions.  The new undefined
-   instructions in ARMv4 are all instructions that had no defined
-   behaviour in earlier chips.  There is no guarantee that they will
-   raise an exception, but may be treated as NOP's.  In practice, it
-   may only safe to rely on instructions matching:
-   
-   3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 
-   1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
-   C C C C 0 1 1 x x x x x x x x x x x x x x x x x x x x 1 x x x x
-   
-   Even this may only true if the condition predicate is true. The
-   following use a condition predicate of ALWAYS so it is always TRUE.
-   
-   There are other ways of forcing a breakpoint.  GNU/Linux, RISC iX,
-   and NetBSD all use a software interrupt rather than an undefined
-   instruction to force a trap.  This can be handled by by the
-   abi-specific code during establishment of the gdbarch vector.  */
-
-#define ARM_LE_BREAKPOINT {0xFE,0xDE,0xFF,0xE7}
-#define ARM_BE_BREAKPOINT {0xE7,0xFF,0xDE,0xFE}
-#define THUMB_LE_BREAKPOINT {0xbe,0xbe}
-#define THUMB_BE_BREAKPOINT {0xbe,0xbe}
-
-static const char arm_default_arm_le_breakpoint[] = ARM_LE_BREAKPOINT;
-static const char arm_default_arm_be_breakpoint[] = ARM_BE_BREAKPOINT;
-static const char arm_default_thumb_le_breakpoint[] = THUMB_LE_BREAKPOINT;
-static const char arm_default_thumb_be_breakpoint[] = THUMB_BE_BREAKPOINT;
-
-/* Determine the type and size of breakpoint to insert at PCPTR.  Uses
-   the program counter value to determine whether a 16-bit or 32-bit
-   breakpoint should be used.  It returns a pointer to a string of
-   bytes that encode a breakpoint instruction, stores the length of
-   the string to *lenptr, and adjusts the program counter (if
-   necessary) to point to the actual memory location where the
-   breakpoint should be inserted.  */
+/* Write to the PC as from a branch-exchange instruction.  */
 
-static const unsigned char *
-arm_breakpoint_from_pc (struct gdbarch *gdbarch, CORE_ADDR *pcptr, int *lenptr)
+static void
+bx_write_pc (struct regcache *regs, ULONGEST val)
 {
-  struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
+  ULONGEST ps;
 
-  if (arm_pc_is_thumb (*pcptr))
+  regcache_cooked_read_unsigned (regs, ARM_PS_REGNUM, &ps);
+
+  if ((val & 1) == 1)
     {
-      *pcptr = UNMAKE_THUMB_ADDR (*pcptr);
-      *lenptr = tdep->thumb_breakpoint_size;
-      return tdep->thumb_breakpoint;
+      regcache_cooked_write_unsigned (regs, ARM_PS_REGNUM, ps | CPSR_T);
+      regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, val & 0xfffffffe);
+    }
+  else if ((val & 2) == 0)
+    {
+      regcache_cooked_write_unsigned (regs, ARM_PS_REGNUM,
+				      ps & ~(ULONGEST) CPSR_T);
+      regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, val);
     }
   else
     {
-      *lenptr = tdep->arm_breakpoint_size;
-      return tdep->arm_breakpoint;
+      /* Unpredictable behaviour.  Try to do something sensible (switch to ARM
+         mode, align dest to 4 bytes).  */
+      warning (_("Single-stepping BX to non-word-aligned ARM instruction."));
+      regcache_cooked_write_unsigned (regs, ARM_PS_REGNUM,
+				      ps & ~(ULONGEST) CPSR_T);
+      regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, val & 0xfffffffc);
     }
 }
 
-/* Extract from an array REGBUF containing the (raw) register state a
-   function return value of type TYPE, and copy that, in virtual
-   format, into VALBUF.  */
+/* Write to the PC as if from a load instruction.  */
 
 static void
-arm_extract_return_value (struct type *type, struct regcache *regs,
-			  gdb_byte *valbuf)
+load_write_pc (struct regcache *regs, ULONGEST val)
 {
-  struct gdbarch *gdbarch = get_regcache_arch (regs);
+  if (DISPLACED_STEPPING_ARCH_VERSION >= 5)
+    bx_write_pc (regs, val);
+  else
+    branch_write_pc (regs, val);
+}
 
-  if (TYPE_CODE_FLT == TYPE_CODE (type))
+/* Write to the PC as if from an ALU instruction.  */
+
+static void
+alu_write_pc (struct regcache *regs, ULONGEST val)
+{
+  if (DISPLACED_STEPPING_ARCH_VERSION >= 7 && displaced_in_arm_mode (regs))
+    bx_write_pc (regs, val);
+  else
+    branch_write_pc (regs, val);
+}
+
+/* Helper for writing to registers for displaced stepping.  Writing to the PC
+   has a varying effects depending on the instruction which does the write:
+   this is controlled by the WRITE_PC argument.  */
+
+void
+displaced_write_reg (struct regcache *regs, struct displaced_step_closure *dsc,
+		     int regno, ULONGEST val, enum pc_write_style write_pc)
+{
+  if (regno == 15)
     {
-      switch (gdbarch_tdep (gdbarch)->fp_model)
-	{
-	case ARM_FLOAT_FPA:
-	  {
-	    /* The value is in register F0 in internal format.  We need to
-	       extract the raw value and then convert it to the desired
-	       internal type.  */
-	    bfd_byte tmpbuf[FP_REGISTER_SIZE];
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: writing pc %.8lx\n",
+			    (unsigned long) val);
+      switch (write_pc)
+        {
+	case BRANCH_WRITE_PC:
+	  branch_write_pc (regs, val);
+	  break;
 
-	    regcache_cooked_read (regs, ARM_F0_REGNUM, tmpbuf);
-	    convert_from_extended (floatformat_from_type (type), tmpbuf,
-				   valbuf, gdbarch_byte_order (gdbarch));
-	  }
+	case BX_WRITE_PC:
+	  bx_write_pc (regs, val);
 	  break;
 
-	case ARM_FLOAT_SOFT_FPA:
-	case ARM_FLOAT_SOFT_VFP:
-	  regcache_cooked_read (regs, ARM_A1_REGNUM, valbuf);
-	  if (TYPE_LENGTH (type) > 4)
-	    regcache_cooked_read (regs, ARM_A1_REGNUM + 1,
-				  valbuf + INT_REGISTER_SIZE);
+	case LOAD_WRITE_PC:
+	  load_write_pc (regs, val);
 	  break;
 
-	default:
-	  internal_error
-	    (__FILE__, __LINE__,
-	     _("arm_extract_return_value: Floating point model not supported"));
+	case ALU_WRITE_PC:
+	  alu_write_pc (regs, val);
 	  break;
-	}
-    }
-  else if (TYPE_CODE (type) == TYPE_CODE_INT
-	   || TYPE_CODE (type) == TYPE_CODE_CHAR
-	   || TYPE_CODE (type) == TYPE_CODE_BOOL
-	   || TYPE_CODE (type) == TYPE_CODE_PTR
-	   || TYPE_CODE (type) == TYPE_CODE_REF
-	   || TYPE_CODE (type) == TYPE_CODE_ENUM)
-    {
-      /* If the the type is a plain integer, then the access is
-	 straight-forward.  Otherwise we have to play around a bit more.  */
-      int len = TYPE_LENGTH (type);
-      int regno = ARM_A1_REGNUM;
-      ULONGEST tmp;
 
-      while (len > 0)
-	{
-	  /* By using store_unsigned_integer we avoid having to do
-	     anything special for small big-endian values.  */
-	  regcache_cooked_read_unsigned (regs, regno++, &tmp);
-	  store_unsigned_integer (valbuf, 
-				  (len > INT_REGISTER_SIZE
-				   ? INT_REGISTER_SIZE : len),
-				  tmp);
-	  len -= INT_REGISTER_SIZE;
-	  valbuf += INT_REGISTER_SIZE;
+	case CANNOT_WRITE_PC:
+	  warning (_("Instruction wrote to PC in an unexpected way when "
+		     "single-stepping"));
+	  break;
+
+	default:
+	  abort ();
 	}
+
+      dsc->wrote_to_pc = 1;
     }
   else
     {
-      /* For a structure or union the behaviour is as if the value had
-         been stored to word-aligned memory and then loaded into 
-         registers with 32-bit load instruction(s).  */
-      int len = TYPE_LENGTH (type);
-      int regno = ARM_A1_REGNUM;
-      bfd_byte tmpbuf[INT_REGISTER_SIZE];
-
-      while (len > 0)
-	{
-	  regcache_cooked_read (regs, regno++, tmpbuf);
-	  memcpy (valbuf, tmpbuf,
-		  len > INT_REGISTER_SIZE ? INT_REGISTER_SIZE : len);
-	  len -= INT_REGISTER_SIZE;
-	  valbuf += INT_REGISTER_SIZE;
-	}
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: writing r%d value %.8lx\n",
+			    regno, (unsigned long) val);
+      regcache_cooked_write_unsigned (regs, regno, val);
     }
 }
 
-
-/* Will a function return an aggregate type in memory or in a
-   register?  Return 0 if an aggregate type can be returned in a
-   register, 1 if it must be returned in memory.  */
+/* This function is used to concisely determine if an instruction INSN
+   references PC.  Register fields of interest in INSN should have the
+   corresponding fields of BITMASK set to 0b1111.  The function returns return 1
+   if any of these fields in INSN reference the PC (also 0b1111, r15), else it
+   returns 0.  */
 
 static int
-arm_return_in_memory (struct gdbarch *gdbarch, struct type *type)
+insn_references_pc (uint32_t insn, uint32_t bitmask)
 {
-  int nRc;
-  enum type_code code;
+  uint32_t lowbit = 1;
 
-  CHECK_TYPEDEF (type);
+  while (bitmask != 0)
+    {
+      uint32_t mask;
 
-  /* In the ARM ABI, "integer" like aggregate types are returned in
-     registers.  For an aggregate type to be integer like, its size
-     must be less than or equal to INT_REGISTER_SIZE and the
-     offset of each addressable subfield must be zero.  Note that bit
-     fields are not addressable, and all addressable subfields of
-     unions always start at offset zero.
+      for (; lowbit && (bitmask & lowbit) == 0; lowbit <<= 1)
+        ;
 
-     This function is based on the behaviour of GCC 2.95.1.
-     See: gcc/arm.c: arm_return_in_memory() for details.
+      if (!lowbit)
+        break;
 
-     Note: All versions of GCC before GCC 2.95.2 do not set up the
-     parameters correctly for a function returning the following
-     structure: struct { float f;}; This should be returned in memory,
-     not a register.  Richard Earnshaw sent me a patch, but I do not
-     know of any way to detect if a function like the above has been
-     compiled with the correct calling convention.  */
+      mask = lowbit * 0xf;
 
-  /* All aggregate types that won't fit in a register must be returned
-     in memory.  */
-  if (TYPE_LENGTH (type) > INT_REGISTER_SIZE)
-    {
-      return 1;
+      if ((insn & mask) == mask)
+        return 1;
+
+      bitmask &= ~mask;
     }
 
-  /* The AAPCS says all aggregates not larger than a word are returned
-     in a register.  */
-  if (gdbarch_tdep (gdbarch)->arm_abi != ARM_ABI_APCS)
-    return 0;
+  return 0;
+}
 
-  /* The only aggregate types that can be returned in a register are
-     structs and unions.  Arrays must be returned in memory.  */
-  code = TYPE_CODE (type);
-  if ((TYPE_CODE_STRUCT != code) && (TYPE_CODE_UNION != code))
-    {
-      return 1;
-    }
+/* The simplest copy function.  Many instructions have the same effect no
+   matter what address they are executed at: in those cases, use this.  */
 
-  /* Assume all other aggregate types can be returned in a register.
-     Run a check for structures, unions and arrays.  */
-  nRc = 0;
+static int
+copy_unmodified (uint32_t insn, const char *iname,
+		 struct displaced_step_closure *dsc)
+{
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying insn %.8lx, "
+			"opcode/class '%s' unmodified\n", (unsigned long) insn,
+			iname);
 
-  if ((TYPE_CODE_STRUCT == code) || (TYPE_CODE_UNION == code))
-    {
-      int i;
-      /* Need to check if this struct/union is "integer" like.  For
-         this to be true, its size must be less than or equal to
-         INT_REGISTER_SIZE and the offset of each addressable
-         subfield must be zero.  Note that bit fields are not
-         addressable, and unions always start at offset zero.  If any
-         of the subfields is a floating point type, the struct/union
-         cannot be an integer type.  */
+  dsc->modinsn[0] = insn;
 
-      /* For each field in the object, check:
-         1) Is it FP? --> yes, nRc = 1;
-         2) Is it addressable (bitpos != 0) and
-         not packed (bitsize == 0)?
-         --> yes, nRc = 1  
-       */
+  return 0;
+}
 
-      for (i = 0; i < TYPE_NFIELDS (type); i++)
-	{
-	  enum type_code field_type_code;
-	  field_type_code = TYPE_CODE (check_typedef (TYPE_FIELD_TYPE (type, i)));
+/* Preload instructions with immediate offset.  */
 
-	  /* Is it a floating point type field?  */
+static void
+cleanup_preload (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  displaced_write_reg (regs, dsc, 0, dsc->tmp[0], CANNOT_WRITE_PC);
+  if (!dsc->u.preload.immed)
+    displaced_write_reg (regs, dsc, 1, dsc->tmp[1], CANNOT_WRITE_PC);
+}
+
+static int
+copy_preload (uint32_t insn, struct regcache *regs,
+	      struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  ULONGEST rn_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000f0000ul))
+    return copy_unmodified (insn, "preload", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying preload insn %.8lx\n",
+			(unsigned long) insn);
+
+  /* Preload instructions:
+
+     {pli/pld} [rn, #+/-imm]
+     ->
+     {pli/pld} [r0, #+/-imm].  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  rn_val = displaced_read_reg (regs, from, rn);
+  displaced_write_reg (regs, dsc, 0, rn_val, CANNOT_WRITE_PC);
+
+  dsc->u.preload.immed = 1;
+
+  dsc->modinsn[0] = insn & 0xfff0ffff;
+
+  dsc->cleanup = &cleanup_preload;
+
+  return 0;
+}
+
+/* Preload instructions with register offset.  */
+
+static int
+copy_preload_reg (uint32_t insn, struct regcache *regs,
+		  struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rm = bits (insn, 0, 3);
+  ULONGEST rn_val, rm_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000f000ful))
+    return copy_unmodified (insn, "preload reg", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying preload insn %.8lx\n",
+			(unsigned long) insn);
+
+  /* Preload register-offset instructions:
+
+     {pli/pld} [rn, rm {, shift}]
+     ->
+     {pli/pld} [r0, r1 {, shift}].  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  dsc->tmp[1] = displaced_read_reg (regs, from, 1);
+  rn_val = displaced_read_reg (regs, from, rn);
+  rm_val = displaced_read_reg (regs, from, rm);
+  displaced_write_reg (regs, dsc, 0, rn_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 1, rm_val, CANNOT_WRITE_PC);
+
+  dsc->u.preload.immed = 0;
+
+  dsc->modinsn[0] = (insn & 0xfff0fff0) | 0x1;
+
+  dsc->cleanup = &cleanup_preload;
+
+  return 0;
+}
+
+/* Copy/cleanup coprocessor load and store instructions.  */
+
+static void
+cleanup_copro_load_store (struct regcache *regs,
+			  struct displaced_step_closure *dsc)
+{
+  ULONGEST rn_val = displaced_read_reg (regs, dsc->insn_addr, 0);
+
+  displaced_write_reg (regs, dsc, 0, dsc->tmp[0], CANNOT_WRITE_PC);
+
+  if (dsc->u.ldst.writeback)
+    displaced_write_reg (regs, dsc, dsc->u.ldst.rn, rn_val, LOAD_WRITE_PC);
+}
+
+static int
+copy_copro_load_store (uint32_t insn, struct regcache *regs,
+		       struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  ULONGEST rn_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000f0000ul))
+    return copy_unmodified (insn, "copro load/store", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying coprocessor "
+			"load/store insn %.8lx\n", (unsigned long) insn);
+
+  /* Coprocessor load/store instructions:
+
+     {stc/stc2} [<Rn>, #+/-imm]  (and other immediate addressing modes)
+     ->
+     {stc/stc2} [r0, #+/-imm].
+
+     ldc/ldc2 are handled identically.  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  rn_val = displaced_read_reg (regs, from, rn);
+  displaced_write_reg (regs, dsc, 0, rn_val, CANNOT_WRITE_PC);
+
+  dsc->u.ldst.writeback = bit (insn, 25);
+  dsc->u.ldst.rn = rn;
+
+  dsc->modinsn[0] = insn & 0xfff0ffff;
+
+  dsc->cleanup = &cleanup_copro_load_store;
+
+  return 0;
+}
+
+/* Clean up branch instructions (actually perform the branch, by setting
+   PC).  */
+
+static void
+cleanup_branch (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  ULONGEST from = dsc->insn_addr;
+  uint32_t status = displaced_read_reg (regs, from, ARM_PS_REGNUM);
+  int branch_taken = condition_true (dsc->u.branch.cond, status);
+  enum pc_write_style write_pc = dsc->u.branch.exchange
+				 ? BX_WRITE_PC : BRANCH_WRITE_PC;
+
+  if (!branch_taken)
+    return;
+
+  if (dsc->u.branch.link)
+    {
+      ULONGEST pc = displaced_read_reg (regs, from, 15);
+      displaced_write_reg (regs, dsc, 14, pc - 4, CANNOT_WRITE_PC);
+    }
+
+  displaced_write_reg (regs, dsc, 15, dsc->u.branch.dest, write_pc);
+}
+
+/* Copy B/BL/BLX instructions with immediate destinations.  */
+
+static int
+copy_b_bl_blx (uint32_t insn, struct regcache *regs,
+	       struct displaced_step_closure *dsc)
+{
+  unsigned int cond = bits (insn, 28, 31);
+  int exchange = (cond == 0xf);
+  int link = exchange || bit (insn, 24);
+  CORE_ADDR from = dsc->insn_addr;
+  long offset;
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying %s immediate insn "
+			"%.8lx\n", (exchange) ? "blx" : (link) ? "bl" : "b",
+			(unsigned long) insn);
+
+  /* Implement "BL<cond> <label>" as:
+
+     Preparation: cond <- instruction condition
+     Insn: mov r0, r0  (nop)
+     Cleanup: if (condition true) { r14 <- pc; pc <- label }.
+
+     B<cond> similar, but don't set r14 in cleanup.  */
+
+  if (exchange)
+    /* For BLX, set bit 0 of the destination.  The cleanup_branch function will
+       then arrange the switch into Thumb mode.  */
+    offset = (bits (insn, 0, 23) << 2) | (bit (insn, 24) << 1) | 1;
+  else
+    offset = bits (insn, 0, 23) << 2;
+
+  if (bit (offset, 25))
+    offset = offset | ~0x3ffffff;
+
+  dsc->u.branch.cond = cond;
+  dsc->u.branch.link = link;
+  dsc->u.branch.exchange = exchange;
+  dsc->u.branch.dest = from + 8 + offset;
+
+  dsc->modinsn[0] = ARM_NOP;
+
+  dsc->cleanup = &cleanup_branch;
+
+  return 0;
+}
+
+/* Copy BX/BLX with register-specified destinations.  */
+
+static int
+copy_bx_blx_reg (uint32_t insn, struct regcache *regs,
+		 struct displaced_step_closure *dsc)
+{
+  unsigned int cond = bits (insn, 28, 31);
+  /* BX:  x12xxx1x
+     BLX: x12xxx3x.  */
+  int link = bit (insn, 5);
+  unsigned int rm = bits (insn, 0, 3);
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying %s register insn "
+			"%.8lx\n", (link) ? "blx" : "bx", (unsigned long) insn);
+
+  /* Implement {BX,BLX}<cond> <reg>" as:
+
+     Preparation: cond <- instruction condition
+     Insn: mov r0, r0 (nop)
+     Cleanup: if (condition true) { r14 <- pc; pc <- dest; }.
+
+     Don't set r14 in cleanup for BX.  */
+
+  dsc->u.branch.dest = displaced_read_reg (regs, from, rm);
+
+  dsc->u.branch.cond = cond;
+  dsc->u.branch.link = link;
+  dsc->u.branch.exchange = 1;
+
+  dsc->modinsn[0] = ARM_NOP;
+
+  dsc->cleanup = &cleanup_branch;
+
+  return 0;
+}
+
+/* Copy/cleanup arithmetic/logic instruction with immediate RHS. */
+
+static void
+cleanup_alu_imm (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  ULONGEST rd_val = displaced_read_reg (regs, dsc->insn_addr, 0);
+  displaced_write_reg (regs, dsc, 0, dsc->tmp[0], CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 1, dsc->tmp[1], CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, dsc->rd, rd_val, ALU_WRITE_PC);
+}
+
+static int
+copy_alu_imm (uint32_t insn, struct regcache *regs,
+	      struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rd = bits (insn, 12, 15);
+  unsigned int op = bits (insn, 21, 24);
+  int is_mov = (op == 0xd);
+  ULONGEST rd_val, rn_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000ff000ul))
+    return copy_unmodified (insn, "ALU immediate", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying immediate %s insn "
+			"%.8lx\n", is_mov ? "move" : "ALU",
+			(unsigned long) insn);
+
+  /* Instruction is of form:
+
+     <op><cond> rd, [rn,] #imm
+
+     Rewrite as:
+
+     Preparation: tmp1, tmp2 <- r0, r1;
+		  r0, r1 <- rd, rn
+     Insn: <op><cond> r0, r1, #imm
+     Cleanup: rd <- r0; r0 <- tmp1; r1 <- tmp2
+  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  dsc->tmp[1] = displaced_read_reg (regs, from, 1);
+  rn_val = displaced_read_reg (regs, from, rn);
+  rd_val = displaced_read_reg (regs, from, rd);
+  displaced_write_reg (regs, dsc, 0, rd_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 1, rn_val, CANNOT_WRITE_PC);
+  dsc->rd = rd;
+
+  if (is_mov)
+    dsc->modinsn[0] = insn & 0xfff00fff;
+  else
+    dsc->modinsn[0] = (insn & 0xfff00fff) | 0x10000;
+
+  dsc->cleanup = &cleanup_alu_imm;
+
+  return 0;
+}
+
+/* Copy/cleanup arithmetic/logic insns with register RHS.  */
+
+static void
+cleanup_alu_reg (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  ULONGEST rd_val;
+  int i;
+
+  rd_val = displaced_read_reg (regs, dsc->insn_addr, 0);
+
+  for (i = 0; i < 3; i++)
+    displaced_write_reg (regs, dsc, i, dsc->tmp[i], CANNOT_WRITE_PC);
+
+  displaced_write_reg (regs, dsc, dsc->rd, rd_val, ALU_WRITE_PC);
+}
+
+static int
+copy_alu_reg (uint32_t insn, struct regcache *regs,
+	      struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rm = bits (insn, 0, 3);
+  unsigned int rd = bits (insn, 12, 15);
+  unsigned int op = bits (insn, 21, 24);
+  int is_mov = (op == 0xd);
+  ULONGEST rd_val, rn_val, rm_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000ff00ful))
+    return copy_unmodified (insn, "ALU reg", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying reg %s insn %.8lx\n",
+			is_mov ? "move" : "ALU", (unsigned long) insn);
+
+  /* Instruction is of form:
+
+     <op><cond> rd, [rn,] rm [, <shift>]
+
+     Rewrite as:
+
+     Preparation: tmp1, tmp2, tmp3 <- r0, r1, r2;
+		  r0, r1, r2 <- rd, rn, rm
+     Insn: <op><cond> r0, r1, r2 [, <shift>]
+     Cleanup: rd <- r0; r0, r1, r2 <- tmp1, tmp2, tmp3
+  */
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  dsc->tmp[1] = displaced_read_reg (regs, from, 1);
+  dsc->tmp[2] = displaced_read_reg (regs, from, 2);
+  rd_val = displaced_read_reg (regs, from, rd);
+  rn_val = displaced_read_reg (regs, from, rn);
+  rm_val = displaced_read_reg (regs, from, rm);
+  displaced_write_reg (regs, dsc, 0, rd_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 1, rn_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, rm_val, CANNOT_WRITE_PC);
+  dsc->rd = rd;
+
+  if (is_mov)
+    dsc->modinsn[0] = (insn & 0xfff00ff0) | 0x2;
+  else
+    dsc->modinsn[0] = (insn & 0xfff00ff0) | 0x10002;
+
+  dsc->cleanup = &cleanup_alu_reg;
+
+  return 0;
+}
+
+/* Cleanup/copy arithmetic/logic insns with shifted register RHS.  */
+
+static void
+cleanup_alu_shifted_reg (struct regcache *regs,
+			 struct displaced_step_closure *dsc)
+{
+  ULONGEST rd_val = displaced_read_reg (regs, dsc->insn_addr, 0);
+  int i;
+
+  for (i = 0; i < 4; i++)
+    displaced_write_reg (regs, dsc, i, dsc->tmp[i], CANNOT_WRITE_PC);
+
+  displaced_write_reg (regs, dsc, dsc->rd, rd_val, ALU_WRITE_PC);
+}
+
+static int
+copy_alu_shifted_reg (uint32_t insn, struct regcache *regs,
+		      struct displaced_step_closure *dsc)
+{
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rm = bits (insn, 0, 3);
+  unsigned int rd = bits (insn, 12, 15);
+  unsigned int rs = bits (insn, 8, 11);
+  unsigned int op = bits (insn, 21, 24);
+  int is_mov = (op == 0xd), i;
+  ULONGEST rd_val, rn_val, rm_val, rs_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000fff0ful))
+    return copy_unmodified (insn, "ALU shifted reg", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying shifted reg %s insn "
+			"%.8lx\n", is_mov ? "move" : "ALU",
+			(unsigned long) insn);
+
+  /* Instruction is of form:
+
+     <op><cond> rd, [rn,] rm, <shift> rs
+
+     Rewrite as:
+
+     Preparation: tmp1, tmp2, tmp3, tmp4 <- r0, r1, r2, r3
+		  r0, r1, r2, r3 <- rd, rn, rm, rs
+     Insn: <op><cond> r0, r1, r2, <shift> r3
+     Cleanup: tmp5 <- r0
+	      r0, r1, r2, r3 <- tmp1, tmp2, tmp3, tmp4
+	      rd <- tmp5
+  */
+
+  for (i = 0; i < 4; i++)
+    dsc->tmp[i] = displaced_read_reg (regs, from, i);
+
+  rd_val = displaced_read_reg (regs, from, rd);
+  rn_val = displaced_read_reg (regs, from, rn);
+  rm_val = displaced_read_reg (regs, from, rm);
+  rs_val = displaced_read_reg (regs, from, rs);
+  displaced_write_reg (regs, dsc, 0, rd_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 1, rn_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, rm_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 3, rs_val, CANNOT_WRITE_PC);
+  dsc->rd = rd;
+
+  if (is_mov)
+    dsc->modinsn[0] = (insn & 0xfff000f0) | 0x302;
+  else
+    dsc->modinsn[0] = (insn & 0xfff000f0) | 0x10302;
+
+  dsc->cleanup = &cleanup_alu_shifted_reg;
+
+  return 0;
+}
+
+/* Clean up load instructions.  */
+
+static void
+cleanup_load (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  ULONGEST rt_val, rt_val2 = 0, rn_val;
+  CORE_ADDR from = dsc->insn_addr;
+
+  rt_val = displaced_read_reg (regs, from, 0);
+  if (dsc->u.ldst.xfersize == 8)
+    rt_val2 = displaced_read_reg (regs, from, 1);
+  rn_val = displaced_read_reg (regs, from, 2);
+
+  displaced_write_reg (regs, dsc, 0, dsc->tmp[0], CANNOT_WRITE_PC);
+  if (dsc->u.ldst.xfersize > 4)
+    displaced_write_reg (regs, dsc, 1, dsc->tmp[1], CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, dsc->tmp[2], CANNOT_WRITE_PC);
+  if (!dsc->u.ldst.immed)
+    displaced_write_reg (regs, dsc, 3, dsc->tmp[3], CANNOT_WRITE_PC);
+
+  /* Handle register writeback.  */
+  if (dsc->u.ldst.writeback)
+    displaced_write_reg (regs, dsc, dsc->u.ldst.rn, rn_val, CANNOT_WRITE_PC);
+  /* Put result in right place.  */
+  displaced_write_reg (regs, dsc, dsc->rd, rt_val, LOAD_WRITE_PC);
+  if (dsc->u.ldst.xfersize == 8)
+    displaced_write_reg (regs, dsc, dsc->rd + 1, rt_val2, LOAD_WRITE_PC);
+}
+
+/* Clean up store instructions.  */
+
+static void
+cleanup_store (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  CORE_ADDR from = dsc->insn_addr;
+  ULONGEST rn_val = displaced_read_reg (regs, from, 2);
+
+  displaced_write_reg (regs, dsc, 0, dsc->tmp[0], CANNOT_WRITE_PC);
+  if (dsc->u.ldst.xfersize > 4)
+    displaced_write_reg (regs, dsc, 1, dsc->tmp[1], CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, dsc->tmp[2], CANNOT_WRITE_PC);
+  if (!dsc->u.ldst.immed)
+    displaced_write_reg (regs, dsc, 3, dsc->tmp[3], CANNOT_WRITE_PC);
+  if (!dsc->u.ldst.restore_r4)
+    displaced_write_reg (regs, dsc, 4, dsc->tmp[4], CANNOT_WRITE_PC);
+
+  /* Writeback.  */
+  if (dsc->u.ldst.writeback)
+    displaced_write_reg (regs, dsc, dsc->u.ldst.rn, rn_val, CANNOT_WRITE_PC);
+}
+
+/* Copy "extra" load/store instructions.  These are halfword/doubleword
+   transfers, which have a different encoding to byte/word transfers.  */
+
+static int
+copy_extra_ld_st (uint32_t insn, int unpriveleged, struct regcache *regs,
+		  struct displaced_step_closure *dsc)
+{
+  unsigned int op1 = bits (insn, 20, 24);
+  unsigned int op2 = bits (insn, 5, 6);
+  unsigned int rt = bits (insn, 12, 15);
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rm = bits (insn, 0, 3);
+  char load[12]     = {0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1};
+  char bytesize[12] = {2, 2, 2, 2, 8, 1, 8, 1, 8, 2, 8, 2};
+  int immed = (op1 & 0x4) != 0;
+  int opcode;
+  ULONGEST rt_val, rt_val2 = 0, rn_val, rm_val = 0;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000ff00ful))
+    return copy_unmodified (insn, "extra load/store", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying %sextra load/store "
+			"insn %.8lx\n", unpriveleged ? "unpriveleged " : "",
+			(unsigned long) insn);
+
+  opcode = ((op2 << 2) | (op1 & 0x1) | ((op1 & 0x4) >> 1)) - 4;
+
+  if (opcode < 0)
+    internal_error (__FILE__, __LINE__,
+		    _("copy_extra_ld_st: instruction decode error"));
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  dsc->tmp[1] = displaced_read_reg (regs, from, 1);
+  dsc->tmp[2] = displaced_read_reg (regs, from, 2);
+  if (!immed)
+    dsc->tmp[3] = displaced_read_reg (regs, from, 3);
+
+  rt_val = displaced_read_reg (regs, from, rt);
+  if (bytesize[opcode] == 8)
+    rt_val2 = displaced_read_reg (regs, from, rt + 1);
+  rn_val = displaced_read_reg (regs, from, rn);
+  if (!immed)
+    rm_val = displaced_read_reg (regs, from, rm);
+
+  displaced_write_reg (regs, dsc, 0, rt_val, CANNOT_WRITE_PC);
+  if (bytesize[opcode] == 8)
+    displaced_write_reg (regs, dsc, 1, rt_val2, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, rn_val, CANNOT_WRITE_PC);
+  if (!immed)
+    displaced_write_reg (regs, dsc, 3, rm_val, CANNOT_WRITE_PC);
+
+  dsc->rd = rt;
+  dsc->u.ldst.xfersize = bytesize[opcode];
+  dsc->u.ldst.rn = rn;
+  dsc->u.ldst.immed = immed;
+  dsc->u.ldst.writeback = bit (insn, 24) == 0 || bit (insn, 21) != 0;
+  dsc->u.ldst.restore_r4 = 0;
+
+  if (immed)
+    /* {ldr,str}<width><cond> rt, [rt2,] [rn, #imm]
+       ->
+       {ldr,str}<width><cond> r0, [r1,] [r2, #imm].  */
+    dsc->modinsn[0] = (insn & 0xfff00fff) | 0x20000;
+  else
+    /* {ldr,str}<width><cond> rt, [rt2,] [rn, +/-rm]
+       ->
+       {ldr,str}<width><cond> r0, [r1,] [r2, +/-r3].  */
+    dsc->modinsn[0] = (insn & 0xfff00ff0) | 0x20003;
+
+  dsc->cleanup = load[opcode] ? &cleanup_load : &cleanup_store;
+
+  return 0;
+}
+
+/* Copy byte/word loads and stores.  */
+
+static int
+copy_ldr_str_ldrb_strb (uint32_t insn, struct regcache *regs,
+			struct displaced_step_closure *dsc, int load, int byte,
+			int usermode)
+{
+  int immed = !bit (insn, 25);
+  unsigned int rt = bits (insn, 12, 15);
+  unsigned int rn = bits (insn, 16, 19);
+  unsigned int rm = bits (insn, 0, 3);  /* Only valid if !immed.  */
+  ULONGEST rt_val, rn_val, rm_val = 0;
+  CORE_ADDR from = dsc->insn_addr;
+
+  if (!insn_references_pc (insn, 0x000ff00ful))
+    return copy_unmodified (insn, "load/store", dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying %s%s insn %.8lx\n",
+			load ? (byte ? "ldrb" : "ldr")
+			     : (byte ? "strb" : "str"), usermode ? "t" : "",
+			(unsigned long) insn);
+
+  dsc->tmp[0] = displaced_read_reg (regs, from, 0);
+  dsc->tmp[2] = displaced_read_reg (regs, from, 2);
+  if (!immed)
+    dsc->tmp[3] = displaced_read_reg (regs, from, 3);
+  if (!load)
+    dsc->tmp[4] = displaced_read_reg (regs, from, 4);
+
+  rt_val = displaced_read_reg (regs, from, rt);
+  rn_val = displaced_read_reg (regs, from, rn);
+  if (!immed)
+    rm_val = displaced_read_reg (regs, from, rm);
+
+  displaced_write_reg (regs, dsc, 0, rt_val, CANNOT_WRITE_PC);
+  displaced_write_reg (regs, dsc, 2, rn_val, CANNOT_WRITE_PC);
+  if (!immed)
+    displaced_write_reg (regs, dsc, 3, rm_val, CANNOT_WRITE_PC);
+
+  dsc->rd = rt;
+  dsc->u.ldst.xfersize = byte ? 1 : 4;
+  dsc->u.ldst.rn = rn;
+  dsc->u.ldst.immed = immed;
+  dsc->u.ldst.writeback = bit (insn, 24) == 0 || bit (insn, 21) != 0;
+
+  /* To write PC we can do:
+
+     scratch+0:  str pc, temp  (*temp = scratch + 8 + offset)
+     scratch+4:  ldr r4, temp
+     scratch+8:  sub r4, r4, pc  (r4 = scratch + 8 + offset - scratch - 8 - 8)
+     scratch+12: add r4, r4, #8  (r4 = offset)
+     scratch+16: add r0, r0, r4
+     scratch+20: str r0, [r2, #imm] (or str r0, [r2, r3])
+     scratch+24: <temp>
+
+     Otherwise we don't know what value to write for PC, since the offset is
+     architecture-dependent (sometimes PC+8, sometimes PC+12).  */
+
+  if (load || rt != 15)
+    {
+      dsc->u.ldst.restore_r4 = 0;
+
+      if (immed)
+	/* {ldr,str}[b]<cond> rt, [rn, #imm], etc.
+	   ->
+	   {ldr,str}[b]<cond> r0, [r2, #imm].  */
+	dsc->modinsn[0] = (insn & 0xfff00fff) | 0x20000;
+      else
+	/* {ldr,str}[b]<cond> rt, [rn, rm], etc.
+	   ->
+	   {ldr,str}[b]<cond> r0, [r2, r3].  */
+	dsc->modinsn[0] = (insn & 0xfff00ff0) | 0x20003;
+    }
+  else
+    {
+      /* We need to use r4 as scratch.  Make sure it's restored afterwards.  */
+      dsc->u.ldst.restore_r4 = 1;
+
+      dsc->modinsn[0] = 0xe58ff014;  /* str pc, [pc, #20].  */
+      dsc->modinsn[1] = 0xe59f4010;  /* ldr r4, [pc, #16].  */
+      dsc->modinsn[2] = 0xe044400f;  /* sub r4, r4, pc.  */
+      dsc->modinsn[3] = 0xe2844008;  /* add r4, r4, #8.  */
+      dsc->modinsn[4] = 0xe0800004;  /* add r0, r0, r4.  */
+
+      /* As above.  */
+      if (immed)
+	dsc->modinsn[5] = (insn & 0xfff00fff) | 0x20000;
+      else
+	dsc->modinsn[5] = (insn & 0xfff00ff0) | 0x20003;
+
+      dsc->modinsn[6] = 0x0;  /* breakpoint location.  */
+      dsc->modinsn[7] = 0x0;  /* scratch space.  */
+
+      dsc->numinsns = 6;
+    }
+
+  dsc->cleanup = load ? &cleanup_load : &cleanup_store;
+
+  return 0;
+}
+
+/* Cleanup LDM instructions with fully-populated register list.  This is an
+   unfortunate corner case: it's impossible to implement correctly by modifying
+   the instruction.  The issue is as follows: we have an instruction,
+
+   ldm rN, {r0-r15}
+
+   which we must rewrite to avoid loading PC.  A possible solution would be to
+   do the load in two halves, something like (with suitable cleanup
+   afterwards):
+
+   mov r8, rN
+   ldm[id][ab] r8!, {r0-r7}
+   str r7, <temp>
+   ldm[id][ab] r8, {r7-r14}
+   <bkpt>
+
+   but at present there's no suitable place for <temp>, since the scratch space
+   is overwritten before the cleanup routine is called.  For now, we simply
+   emulate the instruction.  */
+
+static void
+cleanup_block_load_all (struct regcache *regs,
+			struct displaced_step_closure *dsc)
+{
+  ULONGEST from = dsc->insn_addr;
+  int inc = dsc->u.block.increment;
+  int bump_before = dsc->u.block.before ? (inc ? 4 : -4) : 0;
+  int bump_after = dsc->u.block.before ? 0 : (inc ? 4 : -4);
+  uint32_t regmask = dsc->u.block.regmask;
+  int regno = inc ? 0 : 15;
+  CORE_ADDR xfer_addr = dsc->u.block.xfer_addr;
+  int exception_return = dsc->u.block.load && dsc->u.block.user
+			 && (regmask & 0x8000) != 0;
+  uint32_t status = displaced_read_reg (regs, from, ARM_PS_REGNUM);
+  int do_transfer = condition_true (dsc->u.block.cond, status);
+
+  if (!do_transfer)
+    return;
+
+  /* If the instruction is ldm rN, {...pc}^, I don't think there's anything
+     sensible we can do here.  Complain loudly.  */
+  if (exception_return)
+    error (_("Cannot single-step exception return"));
+
+  /* We don't handle any stores here for now.  */
+  gdb_assert (dsc->u.block.load != 0);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: emulating block transfer: "
+			"%s %s %s\n", dsc->u.block.load ? "ldm" : "stm",
+			dsc->u.block.increment ? "inc" : "dec",
+			dsc->u.block.before ? "before" : "after");
+
+  while (regmask)
+    {
+      uint32_t memword;
+
+      if (inc)
+	while (regno <= 15 && (regmask & (1 << regno)) == 0)
+	  regno++;
+      else
+        while (regno >= 0 && (regmask & (1 << regno)) == 0)
+	  regno--;
+
+      xfer_addr += bump_before;
+
+      memword = read_memory_unsigned_integer (xfer_addr, 4);
+      displaced_write_reg (regs, dsc, regno, memword, LOAD_WRITE_PC);
+
+      xfer_addr += bump_after;
+
+      regmask &= ~(1 << regno);
+    }
+
+  if (dsc->u.block.writeback)
+    displaced_write_reg (regs, dsc, dsc->u.block.rn, xfer_addr,
+			 CANNOT_WRITE_PC);
+}
+
+/* Clean up an STM which included the PC in the register list.  */
+
+static void
+cleanup_block_store_pc (struct regcache *regs,
+			struct displaced_step_closure *dsc)
+{
+  ULONGEST from = dsc->insn_addr;
+  uint32_t status = displaced_read_reg (regs, from, ARM_PS_REGNUM);
+  int store_executed = condition_true (dsc->u.block.cond, status);
+  CORE_ADDR pc_stored_at, transferred_regs = bitcount (dsc->u.block.regmask);
+  CORE_ADDR stm_insn_addr;
+  uint32_t pc_val;
+  long offset;
+
+  /* If condition code fails, there's nothing else to do.  */
+  if (!store_executed)
+    return;
+
+  if (dsc->u.block.increment)
+    {
+      pc_stored_at = dsc->u.block.xfer_addr + 4 * transferred_regs;
+
+      if (dsc->u.block.before)
+        pc_stored_at += 4;
+    }
+  else
+    {
+      pc_stored_at = dsc->u.block.xfer_addr;
+
+      if (dsc->u.block.before)
+        pc_stored_at -= 4;
+    }
+
+  pc_val = read_memory_unsigned_integer (pc_stored_at, 4);
+  stm_insn_addr = dsc->scratch_base;
+  offset = pc_val - stm_insn_addr;
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: detected PC offset %.8lx for "
+			"STM instruction\n", offset);
+
+  /* Rewrite the stored PC to the proper value for the non-displaced original
+     instruction.  */
+  write_memory_unsigned_integer (pc_stored_at, 4, dsc->insn_addr + offset);
+}
+
+/* Clean up an LDM which includes the PC in the register list.  We clumped all
+   the registers in the transferred list into a contiguous range r0...rX (to
+   avoid loading PC directly and losing control of the debugged program), so we
+   must undo that here.  */
+
+static void
+cleanup_block_load_pc (struct regcache *regs,
+		       struct displaced_step_closure *dsc)
+{
+  ULONGEST from = dsc->insn_addr;
+  uint32_t status = displaced_read_reg (regs, from, ARM_PS_REGNUM);
+  int load_executed = condition_true (dsc->u.block.cond, status), i;
+  unsigned int mask = dsc->u.block.regmask, write_reg = 15;
+  unsigned int regs_loaded = bitcount (mask);
+  unsigned int num_to_shuffle = regs_loaded, clobbered;
+
+  /* The method employed here will fail if the register list is fully populated
+     (we need to avoid loading PC directly).  */
+  gdb_assert (num_to_shuffle < 16);
+
+  if (!load_executed)
+    return;
+
+  clobbered = (1 << num_to_shuffle) - 1;
+
+  while (num_to_shuffle > 0)
+    {
+      if ((mask & (1 << write_reg)) != 0)
+        {
+	  unsigned int read_reg = num_to_shuffle - 1;
+
+	  if (read_reg != write_reg)
+	    {
+	      ULONGEST rval = displaced_read_reg (regs, from, read_reg);
+	      displaced_write_reg (regs, dsc, write_reg, rval, LOAD_WRITE_PC);
+	      if (debug_displaced)
+	        fprintf_unfiltered (gdb_stdlog, _("displaced: LDM: move "
+				    "loaded register r%d to r%d\n"), read_reg,
+				    write_reg);
+	    }
+	  else if (debug_displaced)
+	    fprintf_unfiltered (gdb_stdlog, _("displaced: LDM: register "
+				"r%d already in the right place\n"),
+				write_reg);
+
+	  clobbered &= ~(1 << write_reg);
+
+	  num_to_shuffle--;
+	}
+
+      write_reg--;
+    }
+
+  /* Restore any registers we scribbled over.  */
+  for (write_reg = 0; clobbered != 0; write_reg++)
+    {
+      if ((clobbered & (1 << write_reg)) != 0)
+        {
+	  displaced_write_reg (regs, dsc, write_reg, dsc->tmp[write_reg],
+			       CANNOT_WRITE_PC);
+	  if (debug_displaced)
+	    fprintf_unfiltered (gdb_stdlog, _("displaced: LDM: restored "
+				"clobbered register r%d\n"), write_reg);
+	  clobbered &= ~(1 << write_reg);
+	}
+    }
+
+  /* Perform register writeback manually.  */
+  if (dsc->u.block.writeback)
+    {
+      ULONGEST new_rn_val = dsc->u.block.xfer_addr;
+
+      if (dsc->u.block.increment)
+        new_rn_val += regs_loaded * 4;
+      else
+	new_rn_val -= regs_loaded * 4;
+
+      displaced_write_reg (regs, dsc, dsc->u.block.rn, new_rn_val,
+			   CANNOT_WRITE_PC);
+    }
+}
+
+/* Handle ldm/stm, apart from some tricky cases which are unlikely to occur
+   in user-level code (in particular exception return, ldm rn, {...pc}^).  */
+
+static int
+copy_block_xfer (uint32_t insn, struct regcache *regs,
+		 struct displaced_step_closure *dsc)
+{
+  int load = bit (insn, 20);
+  int user = bit (insn, 22);
+  int increment = bit (insn, 23);
+  int before = bit (insn, 24);
+  int writeback = bit (insn, 21);
+  int rn = bits (insn, 16, 19);
+  CORE_ADDR from = dsc->insn_addr;
+
+  /* Block transfers which don't mention PC can be run directly out-of-line.  */
+  if (rn != 15 && (insn & 0x8000) == 0)
+    return copy_unmodified (insn, "ldm/stm", dsc);
+
+  if (rn == 15)
+    {
+      warning (_("displaced: Unpredictable LDM or STM with base register r15"));
+      return copy_unmodified (insn, "unpredictable ldm/stm", dsc);
+    }
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying block transfer insn "
+			"%.8lx\n", (unsigned long) insn);
+
+  dsc->u.block.xfer_addr = displaced_read_reg (regs, from, rn);
+  dsc->u.block.rn = rn;
+
+  dsc->u.block.load = load;
+  dsc->u.block.user = user;
+  dsc->u.block.increment = increment;
+  dsc->u.block.before = before;
+  dsc->u.block.writeback = writeback;
+  dsc->u.block.cond = bits (insn, 28, 31);
+
+  dsc->u.block.regmask = insn & 0xffff;
+
+  if (load)
+    {
+      if ((insn & 0xffff) == 0xffff)
+	{
+	  /* LDM with a fully-populated register list.  This case is
+	     particularly tricky.  Implement for now by fully emulating the
+	     instruction (which might not behave perfectly in all cases, but
+	     these instructions should be rare enough for that not to matter
+	     too much).  */
+	  dsc->modinsn[0] = ARM_NOP;
+
+	  dsc->cleanup = &cleanup_block_load_all;
+	}
+      else
+	{
+	  /* LDM of a list of registers which includes PC.  Implement by
+	     rewriting the list of registers to be transferred into a
+	     contiguous chunk r0...rX before doing the transfer, then shuffling
+	     registers into the correct places in the cleanup routine.  */
+	  unsigned int regmask = insn & 0xffff;
+	  unsigned int num_in_list = bitcount (regmask), new_regmask, bit = 1;
+	  unsigned int to = 0, from = 0, i, new_rn;
+
+	  for (i = 0; i < num_in_list; i++)
+	    dsc->tmp[i] = displaced_read_reg (regs, from, i);
+
+	  /* Writeback makes things complicated.  We need to avoid clobbering
+	     the base register with one of the registers in our modified
+	     register list, but just using a different register can't work in
+	     all cases, e.g.:
+
+	       ldm r14!, {r0-r13,pc}
+
+	     which would need to be rewritten as:
+
+	       ldm rN!, {r0-r14}
+
+	     but that can't work, because there's no free register for N.
+
+	     Solve this by turning off the writeback bit, and emulating
+	     writeback manually in the cleanup routine.  */
+
+	  if (writeback)
+	    insn &= ~(1 << 21);
+
+	  new_regmask = (1 << num_in_list) - 1;
+
+	  if (debug_displaced)
+	    fprintf_unfiltered (gdb_stdlog, _("displaced: LDM r%d%s, "
+				"{..., pc}: original reg list %.4x, modified "
+				"list %.4x\n"), rn, writeback ? "!" : "",
+				(int) insn & 0xffff, new_regmask);
+
+	  dsc->modinsn[0] = (insn & ~0xffff) | (new_regmask & 0xffff);
+
+	  dsc->cleanup = &cleanup_block_load_pc;
+	}
+    }
+  else
+    {
+      /* STM of a list of registers which includes PC.  Run the instruction
+	 as-is, but out of line: this will store the wrong value for the PC,
+	 so we must manually fix up the memory in the cleanup routine.
+	 Doing things this way has the advantage that we can auto-detect
+	 the offset of the PC write (which is architecture-dependent) in
+	 the cleanup routine.  */
+      dsc->modinsn[0] = insn;
+
+      dsc->cleanup = &cleanup_block_store_pc;
+    }
+
+  return 0;
+}
+
+/* Cleanup/copy SVC (SWI) instructions.  These two functions are overridden
+   for Linux, where some SVC instructions must be treated specially.  */
+
+static void
+cleanup_svc (struct regcache *regs, struct displaced_step_closure *dsc)
+{
+  CORE_ADDR from = dsc->insn_addr;
+  CORE_ADDR resume_addr = from + 4;
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: cleanup for svc, resume at "
+			"%.8lx\n", (unsigned long) resume_addr);
+
+  displaced_write_reg (regs, dsc, ARM_PC_REGNUM, resume_addr, BRANCH_WRITE_PC);
+}
+
+static int
+copy_svc (uint32_t insn, CORE_ADDR to, struct regcache *regs,
+	  struct displaced_step_closure *dsc)
+{
+  CORE_ADDR from = dsc->insn_addr;
+
+  /* Allow OS-specific code to override SVC handling.  */
+  if (dsc->u.svc.copy_svc_os)
+    return dsc->u.svc.copy_svc_os (insn, to, regs, dsc);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying svc insn %.8lx\n",
+			(unsigned long) insn);
+
+  /* Preparation: none.
+     Insn: unmodified svc.
+     Cleanup: pc <- insn_addr + 4.  */
+
+  dsc->modinsn[0] = insn;
+
+  dsc->cleanup = &cleanup_svc;
+  /* Pretend we wrote to the PC, so cleanup doesn't set PC to the next
+     instruction.  */
+  dsc->wrote_to_pc = 1;
+
+  return 0;
+}
+
+/* Copy undefined instructions.  */
+
+static int
+copy_undef (uint32_t insn, struct displaced_step_closure *dsc)
+{
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying undefined insn %.8lx\n",
+			(unsigned long) insn);
+
+  dsc->modinsn[0] = insn;
+
+  return 0;
+}
+
+/* Copy unpredictable instructions.  */
+
+static int
+copy_unpred (uint32_t insn, struct displaced_step_closure *dsc)
+{
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copying unpredictable insn "
+			"%.8lx\n", (unsigned long) insn);
+
+  dsc->modinsn[0] = insn;
+
+  return 0;
+}
+
+/* The decode_* functions are instruction decoding helpers.  They mostly follow
+   the presentation in the ARM ARM.  */
+
+static int
+decode_misc_memhint_neon (uint32_t insn, struct regcache *regs,
+			  struct displaced_step_closure *dsc)
+{
+  unsigned int op1 = bits (insn, 20, 26), op2 = bits (insn, 4, 7);
+  unsigned int rn = bits (insn, 16, 19);
+
+  if (op1 == 0x10 && (op2 & 0x2) == 0x0 && (rn & 0xe) == 0x0)
+    return copy_unmodified (insn, "cps", dsc);
+  else if (op1 == 0x10 && op2 == 0x0 && (rn & 0xe) == 0x1)
+    return copy_unmodified (insn, "setend", dsc);
+  else if ((op1 & 0x60) == 0x20)
+    return copy_unmodified (insn, "neon dataproc", dsc);
+  else if ((op1 & 0x71) == 0x40)
+    return copy_unmodified (insn, "neon elt/struct load/store", dsc);
+  else if ((op1 & 0x77) == 0x41)
+    return copy_unmodified (insn, "unallocated mem hint", dsc);
+  else if ((op1 & 0x77) == 0x45)
+    return copy_preload (insn, regs, dsc);  /* pli.  */
+  else if ((op1 & 0x77) == 0x51)
+    {
+      if (rn != 0xf)
+        return copy_preload (insn, regs, dsc);  /* pld/pldw.  */
+      else
+        return copy_unpred (insn, dsc);
+    }
+  else if ((op1 & 0x77) == 0x55)
+    return copy_preload (insn, regs, dsc);  /* pld/pldw.  */
+  else if (op1 == 0x57)
+    switch (op2)
+      {
+      case 0x1: return copy_unmodified (insn, "clrex", dsc);
+      case 0x4: return copy_unmodified (insn, "dsb", dsc);
+      case 0x5: return copy_unmodified (insn, "dmb", dsc);
+      case 0x6: return copy_unmodified (insn, "isb", dsc);
+      default: return copy_unpred (insn, dsc);
+      }
+  else if ((op1 & 0x63) == 0x43)
+    return copy_unpred (insn, dsc);
+  else if ((op2 & 0x1) == 0x0)
+    switch (op1 & ~0x80)
+      {
+      case 0x61:
+	return copy_unmodified (insn, "unallocated mem hint", dsc);
+      case 0x65:
+	return copy_preload_reg (insn, regs, dsc);  /* pli reg.  */
+      case 0x71: case 0x75:
+	return copy_preload_reg (insn, regs, dsc);  /* pld/pldw reg.  */
+      case 0x63: case 0x67: case 0x73: case 0x77:
+	return copy_unpred (insn, dsc);
+      default:
+	return copy_undef (insn, dsc);
+      }
+  else
+    return copy_undef (insn, dsc);  /* Probably unreachable.  */
+}
+
+static int
+decode_unconditional (uint32_t insn, struct regcache *regs,
+		      struct displaced_step_closure *dsc)
+{
+  if (bit (insn, 27) == 0)
+    return decode_misc_memhint_neon (insn, regs, dsc);
+  /* Switch on bits: 0bxxxxx321xxx0xxxxxxxxxxxxxxxxxxxx.  */
+  else switch (((insn & 0x7000000) >> 23) | ((insn & 0x100000) >> 20))
+    {
+    case 0x0: case 0x2:
+      return copy_unmodified (insn, "srs", dsc);
+
+    case 0x1: case 0x3:
+      return copy_unmodified (insn, "rfe", dsc);
+
+    case 0x4: case 0x5: case 0x6: case 0x7:
+      return copy_b_bl_blx (insn, regs, dsc);
+
+    case 0x8:
+      switch ((insn & 0xe00000) >> 21)
+	{
+	case 0x1: case 0x3: case 0x4: case 0x5: case 0x6: case 0x7:
+	  return copy_copro_load_store (insn, regs, dsc); /* stc/stc2.  */
+
+	case 0x2:
+	  return copy_unmodified (insn, "mcrr/mcrr2", dsc);
+
+	default:
+	  return copy_undef (insn, dsc);
+	}
+
+    case 0x9:
+      {
+        int rn_f = (bits (insn, 16, 19) == 0xf);
+	switch ((insn & 0xe00000) >> 21)
+	  {
+	  case 0x1: case 0x3:
+	    /* ldc/ldc2 imm (undefined for rn == pc).  */
+	    return rn_f ? copy_undef (insn, dsc)
+			: copy_copro_load_store (insn, regs, dsc);
+
+	  case 0x2:
+	    return copy_unmodified (insn, "mrrc/mrrc2", dsc);
+
+	  case 0x4: case 0x5: case 0x6: case 0x7:
+	    /* ldc/ldc2 lit (undefined for rn != pc).  */
+	    return rn_f ? copy_copro_load_store (insn, regs, dsc)
+			: copy_undef (insn, dsc);
+
+	  default:
+	    return copy_undef (insn, dsc);
+	  }
+      }
+
+    case 0xa:
+      return copy_unmodified (insn, "stc/stc2", dsc);
+
+    case 0xb:
+      if (bits (insn, 16, 19) == 0xf)
+        return copy_copro_load_store (insn, regs, dsc);  /* ldc/ldc2 lit.  */
+      else
+        return copy_undef (insn, dsc);
+
+    case 0xc:
+      if (bit (insn, 4))
+	return copy_unmodified (insn, "mcr/mcr2", dsc);
+      else
+	return copy_unmodified (insn, "cdp/cdp2", dsc);
+
+    case 0xd:
+      if (bit (insn, 4))
+        return copy_unmodified (insn, "mrc/mrc2", dsc);
+      else
+	return copy_unmodified (insn, "cdp/cdp2", dsc);
+
+    default:
+      return copy_undef (insn, dsc);
+    }
+}
+
+/* Decode miscellaneous instructions in dp/misc encoding space.  */
+
+static int
+decode_miscellaneous (uint32_t insn, struct regcache *regs,
+		      struct displaced_step_closure *dsc)
+{
+  unsigned int op2 = bits (insn, 4, 6);
+  unsigned int op = bits (insn, 21, 22);
+  unsigned int op1 = bits (insn, 16, 19);
+
+  switch (op2)
+    {
+    case 0x0:
+      return copy_unmodified (insn, "mrs/msr", dsc);
+
+    case 0x1:
+      if (op == 0x1)  /* bx.  */
+        return copy_bx_blx_reg (insn, regs, dsc);
+      else if (op == 0x3)
+        return copy_unmodified (insn, "clz", dsc);
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x2:
+      if (op == 0x1)
+        return copy_unmodified (insn, "bxj", dsc);  /* Not really supported.  */
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x3:
+      if (op == 0x1)
+        return copy_bx_blx_reg (insn, regs, dsc);  /* blx register.  */
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x5:
+      return copy_unmodified (insn, "saturating add/sub", dsc);
+
+    case 0x7:
+      if (op == 0x1)
+        return copy_unmodified (insn, "bkpt", dsc);
+      else if (op == 0x3)
+        return copy_unmodified (insn, "smc", dsc);  /* Not really supported.  */
+
+    default:
+      return copy_undef (insn, dsc);
+    }
+}
+
+static int
+decode_dp_misc (uint32_t insn, struct regcache *regs,
+		struct displaced_step_closure *dsc)
+{
+  if (bit (insn, 25))
+    switch (bits (insn, 20, 24))
+      {
+      case 0x10:
+        return copy_unmodified (insn, "movw", dsc);
+
+      case 0x14:
+        return copy_unmodified (insn, "movt", dsc);
+
+      case 0x12: case 0x16:
+        return copy_unmodified (insn, "msr imm", dsc);
+
+      default:
+        return copy_alu_imm (insn, regs, dsc);
+      }
+  else
+    {
+      uint32_t op1 = bits (insn, 20, 24), op2 = bits (insn, 4, 7);
+
+      if ((op1 & 0x19) != 0x10 && (op2 & 0x1) == 0x0)
+        return copy_alu_reg (insn, regs, dsc);
+      else if ((op1 & 0x19) != 0x10 && (op2 & 0x9) == 0x1)
+        return copy_alu_shifted_reg (insn, regs, dsc);
+      else if ((op1 & 0x19) == 0x10 && (op2 & 0x8) == 0x0)
+        return decode_miscellaneous (insn, regs, dsc);
+      else if ((op1 & 0x19) == 0x10 && (op2 & 0x9) == 0x8)
+        return copy_unmodified (insn, "halfword mul/mla", dsc);
+      else if ((op1 & 0x10) == 0x00 && op2 == 0x9)
+        return copy_unmodified (insn, "mul/mla", dsc);
+      else if ((op1 & 0x10) == 0x10 && op2 == 0x9)
+        return copy_unmodified (insn, "synch", dsc);
+      else if (op2 == 0xb || (op2 & 0xd) == 0xd)
+        /* 2nd arg means "unpriveleged".  */
+        return copy_extra_ld_st (insn, (op1 & 0x12) == 0x02, regs, dsc);
+    }
+
+  /* Should be unreachable.  */
+  return 1;
+}
+
+static int
+decode_ld_st_word_ubyte (uint32_t insn, struct regcache *regs,
+			 struct displaced_step_closure *dsc)
+{
+  int a = bit (insn, 25), b = bit (insn, 4);
+  uint32_t op1 = bits (insn, 20, 24);
+  int rn_f = bits (insn, 16, 19) == 0xf;
+
+  if ((!a && (op1 & 0x05) == 0x00 && (op1 & 0x17) != 0x02)
+      || (a && (op1 & 0x05) == 0x00 && (op1 & 0x17) != 0x02 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 0, 0, 0);
+  else if ((!a && (op1 & 0x17) == 0x02)
+           || (a && (op1 & 0x17) == 0x02 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 0, 0, 1);
+  else if ((!a && (op1 & 0x05) == 0x01 && (op1 & 0x17) != 0x03)
+           || (a && (op1 & 0x05) == 0x01 && (op1 & 0x17) != 0x03 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 1, 0, 0);
+  else if ((!a && (op1 & 0x17) == 0x03)
+	   || (a && (op1 & 0x17) == 0x03 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 1, 0, 1);
+  else if ((!a && (op1 & 0x05) == 0x04 && (op1 & 0x17) != 0x06)
+           || (a && (op1 & 0x05) == 0x04 && (op1 & 0x17) != 0x06 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 0, 1, 0);
+  else if ((!a && (op1 & 0x17) == 0x06)
+	   || (a && (op1 & 0x17) == 0x06 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 0, 1, 1);
+  else if ((!a && (op1 & 0x05) == 0x05 && (op1 & 0x17) != 0x07)
+	   || (a && (op1 & 0x05) == 0x05 && (op1 & 0x17) != 0x07 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 1, 1, 0);
+  else if ((!a && (op1 & 0x17) == 0x07)
+	   || (a && (op1 & 0x17) == 0x07 && !b))
+    return copy_ldr_str_ldrb_strb (insn, regs, dsc, 1, 1, 1);
+
+  /* Should be unreachable.  */
+  return 1;
+}
+
+static int
+decode_media (uint32_t insn, struct displaced_step_closure *dsc)
+{
+  switch (bits (insn, 20, 24))
+    {
+    case 0x00: case 0x01: case 0x02: case 0x03:
+      return copy_unmodified (insn, "parallel add/sub signed", dsc);
+
+    case 0x04: case 0x05: case 0x06: case 0x07:
+      return copy_unmodified (insn, "parallel add/sub unsigned", dsc);
+
+    case 0x08: case 0x09: case 0x0a: case 0x0b:
+    case 0x0c: case 0x0d: case 0x0e: case 0x0f:
+      return copy_unmodified (insn, "decode/pack/unpack/saturate/reverse", dsc);
+
+    case 0x18:
+      if (bits (insn, 5, 7) == 0)  /* op2.  */
+        {
+	  if (bits (insn, 12, 15) == 0xf)
+	    return copy_unmodified (insn, "usad8", dsc);
+	  else
+	    return copy_unmodified (insn, "usada8", dsc);
+	}
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x1a: case 0x1b:
+      if (bits (insn, 5, 6) == 0x2)  /* op2[1:0].  */
+	return copy_unmodified (insn, "sbfx", dsc);
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x1c: case 0x1d:
+      if (bits (insn, 5, 6) == 0x0)  /* op2[1:0].  */
+        {
+	  if (bits (insn, 0, 3) == 0xf)
+	    return copy_unmodified (insn, "bfc", dsc);
+	  else
+	    return copy_unmodified (insn, "bfi", dsc);
+	}
+      else
+        return copy_undef (insn, dsc);
+
+    case 0x1e: case 0x1f:
+      if (bits (insn, 5, 6) == 0x2)  /* op2[1:0].  */
+        return copy_unmodified (insn, "ubfx", dsc);
+      else
+        return copy_undef (insn, dsc);
+    }
+
+  /* Should be unreachable.  */
+  return 1;
+}
+
+static int
+decode_b_bl_ldmstm (uint32_t insn, struct regcache *regs,
+		    struct displaced_step_closure *dsc)
+{
+  if (bit (insn, 25))
+    return copy_b_bl_blx (insn, regs, dsc);
+  else
+    return copy_block_xfer (insn, regs, dsc);
+}
+
+static int
+decode_ext_reg_ld_st (uint32_t insn, struct regcache *regs,
+		      struct displaced_step_closure *dsc)
+{
+  unsigned int opcode = bits (insn, 20, 24);
+
+  switch (opcode)
+    {
+    case 0x04: case 0x05:  /* VFP/Neon mrrc/mcrr.  */
+      return copy_unmodified (insn, "vfp/neon mrrc/mcrr", dsc);
+
+    case 0x08: case 0x0a: case 0x0c: case 0x0e:
+    case 0x12: case 0x16:
+      return copy_unmodified (insn, "vfp/neon vstm/vpush", dsc);
+
+    case 0x09: case 0x0b: case 0x0d: case 0x0f:
+    case 0x13: case 0x17:
+      return copy_unmodified (insn, "vfp/neon vldm/vpop", dsc);
+
+    case 0x10: case 0x14: case 0x18: case 0x1c:  /* vstr.  */
+    case 0x11: case 0x15: case 0x19: case 0x1d:  /* vldr.  */
+      /* Note: no writeback for these instructions.  Bit 25 will always be
+	 zero though (via caller), so the following works OK.  */
+      return copy_copro_load_store (insn, regs, dsc);
+    }
+
+  /* Should be unreachable.  */
+  return 1;
+}
+
+static int
+decode_svc_copro (uint32_t insn, CORE_ADDR to, struct regcache *regs,
+		  struct displaced_step_closure *dsc)
+{
+  unsigned int op1 = bits (insn, 20, 25);
+  int op = bit (insn, 4);
+  unsigned int coproc = bits (insn, 8, 11);
+  unsigned int rn = bits (insn, 16, 19);
+
+  if ((op1 & 0x20) == 0x00 && (op1 & 0x3a) != 0x00 && (coproc & 0xe) == 0xa)
+    return decode_ext_reg_ld_st (insn, regs, dsc);
+  else if ((op1 & 0x21) == 0x00 && (op1 & 0x3a) != 0x00
+	   && (coproc & 0xe) != 0xa)
+    return copy_copro_load_store (insn, regs, dsc);  /* stc/stc2.  */
+  else if ((op1 & 0x21) == 0x01 && (op1 & 0x3a) != 0x00
+	   && (coproc & 0xe) != 0xa)
+    return copy_copro_load_store (insn, regs, dsc);  /* ldc/ldc2 imm/lit.  */
+  else if ((op1 & 0x3e) == 0x00)
+    return copy_undef (insn, dsc);
+  else if ((op1 & 0x3e) == 0x04 && (coproc & 0xe) == 0xa)
+    return copy_unmodified (insn, "neon 64bit xfer", dsc);
+  else if (op1 == 0x04 && (coproc & 0xe) != 0xa)
+    return copy_unmodified (insn, "mcrr/mcrr2", dsc);
+  else if (op1 == 0x05 && (coproc & 0xe) != 0xa)
+    return copy_unmodified (insn, "mrrc/mrrc2", dsc);
+  else if ((op1 & 0x30) == 0x20 && !op)
+    {
+      if ((coproc & 0xe) == 0xa)
+	return copy_unmodified (insn, "vfp dataproc", dsc);
+      else
+        return copy_unmodified (insn, "cdp/cdp2", dsc);
+    }
+  else if ((op1 & 0x30) == 0x20 && op)
+    return copy_unmodified (insn, "neon 8/16/32 bit xfer", dsc);
+  else if ((op1 & 0x31) == 0x20 && op && (coproc & 0xe) != 0xa)
+    return copy_unmodified (insn, "mcr/mcr2", dsc);
+  else if ((op1 & 0x31) == 0x21 && op && (coproc & 0xe) != 0xa)
+    return copy_unmodified (insn, "mrc/mrc2", dsc);
+  else if ((op1 & 0x30) == 0x30)
+    return copy_svc (insn, to, regs, dsc);
+  else
+    return copy_undef (insn, dsc);  /* Possibly unreachable.  */
+}
+
+void
+arm_process_displaced_insn (uint32_t insn, CORE_ADDR from, CORE_ADDR to,
+			    struct regcache *regs,
+			    struct displaced_step_closure *dsc)
+{
+  int err = 0;
+
+  if (!displaced_in_arm_mode (regs))
+    error (_("Displaced stepping is only supported in ARM mode"));
+
+  /* Most displaced instructions use a 1-instruction scratch space, so set this
+     here and override below if/when necessary.  */
+  dsc->numinsns = 1;
+  dsc->insn_addr = from;
+  dsc->scratch_base = to;
+  dsc->cleanup = NULL;
+  dsc->wrote_to_pc = 0;
+
+  if ((insn & 0xf0000000) == 0xf0000000)
+    err = decode_unconditional (insn, regs, dsc);
+  else switch (((insn & 0x10) >> 4) | ((insn & 0xe000000) >> 24))
+    {
+    case 0x0: case 0x1: case 0x2: case 0x3:
+      err = decode_dp_misc (insn, regs, dsc);
+      break;
+
+    case 0x4: case 0x5: case 0x6:
+      err = decode_ld_st_word_ubyte (insn, regs, dsc);
+      break;
+
+    case 0x7:
+      err = decode_media (insn, dsc);
+      break;
+
+    case 0x8: case 0x9: case 0xa: case 0xb:
+      err = decode_b_bl_ldmstm (insn, regs, dsc);
+      break;
+
+    case 0xc: case 0xd: case 0xe: case 0xf:
+      err = decode_svc_copro (insn, to, regs, dsc);
+      break;
+    }
+
+  if (err)
+    internal_error (__FILE__, __LINE__,
+		    _("arm_process_displaced_insn: Instruction decode error"));
+}
+
+/* Actually set up the scratch space for a displaced instruction.  */
+
+void
+arm_displaced_init_closure (struct gdbarch *gdbarch, CORE_ADDR from,
+			    CORE_ADDR to, struct displaced_step_closure *dsc)
+{
+  struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
+  unsigned int i;
+
+  /* Poke modified instruction(s).  */
+  for (i = 0; i < dsc->numinsns; i++)
+    {
+      if (debug_displaced)
+        fprintf_unfiltered (gdb_stdlog, "displaced: writing insn %.8lx at "
+			    "%.8lx\n", (unsigned long) dsc->modinsn[i],
+			    (unsigned long) to + i * 4);
+      write_memory_unsigned_integer (to + i * 4, 4, dsc->modinsn[i]);
+    }
+
+  /* Put breakpoint afterwards.  */
+  write_memory (to + dsc->numinsns * 4, tdep->arm_breakpoint,
+		tdep->arm_breakpoint_size);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: copy 0x%s->0x%s: ",
+			paddr_nz (from), paddr_nz (to));
+}
+
+/* Entry point for copying an instruction into scratch space for displaced
+   stepping.  */
+
+struct displaced_step_closure *
+arm_displaced_step_copy_insn (struct gdbarch *gdbarch,
+			      CORE_ADDR from, CORE_ADDR to,
+			      struct regcache *regs)
+{
+  struct displaced_step_closure *dsc
+    = xmalloc (sizeof (struct displaced_step_closure));
+  uint32_t insn = read_memory_unsigned_integer (from, 4);
+
+  if (debug_displaced)
+    fprintf_unfiltered (gdb_stdlog, "displaced: stepping insn %.8lx "
+			"at %.8lx\n", (unsigned long) insn,
+			(unsigned long) from);
+
+  arm_process_displaced_insn (insn, from, to, regs, dsc);
+  arm_displaced_init_closure (gdbarch, from, to, dsc);
+
+  return dsc;
+}
+
+/* Entry point for cleaning things up after a displaced instruction has been
+   single-stepped.  */
+
+void
+arm_displaced_step_fixup (struct gdbarch *gdbarch,
+			  struct displaced_step_closure *dsc,
+			  CORE_ADDR from, CORE_ADDR to,
+			  struct regcache *regs)
+{
+  if (dsc->cleanup)
+    dsc->cleanup (regs, dsc);
+
+  if (!dsc->wrote_to_pc)
+    regcache_cooked_write_unsigned (regs, ARM_PC_REGNUM, dsc->insn_addr + 4);
+}
+
+
+#include "bfd-in2.h"
+#include "libcoff.h"
+
+static int
+gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
+{
+  if (arm_pc_is_thumb (memaddr))
+    {
+      static asymbol *asym;
+      static combined_entry_type ce;
+      static struct coff_symbol_struct csym;
+      static struct bfd fake_bfd;
+      static bfd_target fake_target;
+
+      if (csym.native == NULL)
+	{
+	  /* Create a fake symbol vector containing a Thumb symbol.
+	     This is solely so that the code in print_insn_little_arm()
+	     and print_insn_big_arm() in opcodes/arm-dis.c will detect
+	     the presence of a Thumb symbol and switch to decoding
+	     Thumb instructions.  */
+
+	  fake_target.flavour = bfd_target_coff_flavour;
+	  fake_bfd.xvec = &fake_target;
+	  ce.u.syment.n_sclass = C_THUMBEXTFUNC;
+	  csym.native = &ce;
+	  csym.symbol.the_bfd = &fake_bfd;
+	  csym.symbol.name = "fake";
+	  asym = (asymbol *) & csym;
+	}
+
+      memaddr = UNMAKE_THUMB_ADDR (memaddr);
+      info->symbols = &asym;
+    }
+  else
+    info->symbols = NULL;
+
+  if (info->endian == BFD_ENDIAN_BIG)
+    return print_insn_big_arm (memaddr, info);
+  else
+    return print_insn_little_arm (memaddr, info);
+}
+
+/* The following define instruction sequences that will cause ARM
+   cpu's to take an undefined instruction trap.  These are used to
+   signal a breakpoint to GDB.
+
+   The newer ARMv4T cpu's are capable of operating in ARM or Thumb
+   modes.  A different instruction is required for each mode.  The ARM
+   cpu's can also be big or little endian.  Thus four different
+   instructions are needed to support all cases.
+
+   Note: ARMv4 defines several new instructions that will take the
+   undefined instruction trap.  ARM7TDMI is nominally ARMv4T, but does
+   not in fact add the new instructions.  The new undefined
+   instructions in ARMv4 are all instructions that had no defined
+   behaviour in earlier chips.  There is no guarantee that they will
+   raise an exception, but may be treated as NOP's.  In practice, it
+   may only safe to rely on instructions matching:
+
+   3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
+   1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+   C C C C 0 1 1 x x x x x x x x x x x x x x x x x x x x 1 x x x x
+
+   Even this may only true if the condition predicate is true. The
+   following use a condition predicate of ALWAYS so it is always TRUE.
+
+   There are other ways of forcing a breakpoint.  GNU/Linux, RISC iX,
+   and NetBSD all use a software interrupt rather than an undefined
+   instruction to force a trap.  This can be handled by by the
+   abi-specific code during establishment of the gdbarch vector.  */
+
+#define ARM_LE_BREAKPOINT {0xFE,0xDE,0xFF,0xE7}
+#define ARM_BE_BREAKPOINT {0xE7,0xFF,0xDE,0xFE}
+#define THUMB_LE_BREAKPOINT {0xbe,0xbe}
+#define THUMB_BE_BREAKPOINT {0xbe,0xbe}
+
+static const char arm_default_arm_le_breakpoint[] = ARM_LE_BREAKPOINT;
+static const char arm_default_arm_be_breakpoint[] = ARM_BE_BREAKPOINT;
+static const char arm_default_thumb_le_breakpoint[] = THUMB_LE_BREAKPOINT;
+static const char arm_default_thumb_be_breakpoint[] = THUMB_BE_BREAKPOINT;
+
+/* Determine the type and size of breakpoint to insert at PCPTR.  Uses
+   the program counter value to determine whether a 16-bit or 32-bit
+   breakpoint should be used.  It returns a pointer to a string of
+   bytes that encode a breakpoint instruction, stores the length of
+   the string to *lenptr, and adjusts the program counter (if
+   necessary) to point to the actual memory location where the
+   breakpoint should be inserted.  */
+
+static const unsigned char *
+arm_breakpoint_from_pc (struct gdbarch *gdbarch, CORE_ADDR *pcptr, int *lenptr)
+{
+  struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
+
+  if (arm_pc_is_thumb (*pcptr))
+    {
+      *pcptr = UNMAKE_THUMB_ADDR (*pcptr);
+      *lenptr = tdep->thumb_breakpoint_size;
+      return tdep->thumb_breakpoint;
+    }
+  else
+    {
+      *lenptr = tdep->arm_breakpoint_size;
+      return tdep->arm_breakpoint;
+    }
+}
+
+/* Extract from an array REGBUF containing the (raw) register state a
+   function return value of type TYPE, and copy that, in virtual
+   format, into VALBUF.  */
+
+static void
+arm_extract_return_value (struct type *type, struct regcache *regs,
+			  gdb_byte *valbuf)
+{
+  struct gdbarch *gdbarch = get_regcache_arch (regs);
+
+  if (TYPE_CODE_FLT == TYPE_CODE (type))
+    {
+      switch (gdbarch_tdep (gdbarch)->fp_model)
+	{
+	case ARM_FLOAT_FPA:
+	  {
+	    /* The value is in register F0 in internal format.  We need to
+	       extract the raw value and then convert it to the desired
+	       internal type.  */
+	    bfd_byte tmpbuf[FP_REGISTER_SIZE];
+
+	    regcache_cooked_read (regs, ARM_F0_REGNUM, tmpbuf);
+	    convert_from_extended (floatformat_from_type (type), tmpbuf,
+				   valbuf, gdbarch_byte_order (gdbarch));
+	  }
+	  break;
+
+	case ARM_FLOAT_SOFT_FPA:
+	case ARM_FLOAT_SOFT_VFP:
+	  regcache_cooked_read (regs, ARM_A1_REGNUM, valbuf);
+	  if (TYPE_LENGTH (type) > 4)
+	    regcache_cooked_read (regs, ARM_A1_REGNUM + 1,
+				  valbuf + INT_REGISTER_SIZE);
+	  break;
+
+	default:
+	  internal_error
+	    (__FILE__, __LINE__,
+	     _("arm_extract_return_value: Floating point model not supported"));
+	  break;
+	}
+    }
+  else if (TYPE_CODE (type) == TYPE_CODE_INT
+	   || TYPE_CODE (type) == TYPE_CODE_CHAR
+	   || TYPE_CODE (type) == TYPE_CODE_BOOL
+	   || TYPE_CODE (type) == TYPE_CODE_PTR
+	   || TYPE_CODE (type) == TYPE_CODE_REF
+	   || TYPE_CODE (type) == TYPE_CODE_ENUM)
+    {
+      /* If the the type is a plain integer, then the access is
+	 straight-forward.  Otherwise we have to play around a bit more.  */
+      int len = TYPE_LENGTH (type);
+      int regno = ARM_A1_REGNUM;
+      ULONGEST tmp;
+
+      while (len > 0)
+	{
+	  /* By using store_unsigned_integer we avoid having to do
+	     anything special for small big-endian values.  */
+	  regcache_cooked_read_unsigned (regs, regno++, &tmp);
+	  store_unsigned_integer (valbuf,
+				  (len > INT_REGISTER_SIZE
+				   ? INT_REGISTER_SIZE : len),
+				  tmp);
+	  len -= INT_REGISTER_SIZE;
+	  valbuf += INT_REGISTER_SIZE;
+	}
+    }
+  else
+    {
+      /* For a structure or union the behaviour is as if the value had
+         been stored to word-aligned memory and then loaded into
+         registers with 32-bit load instruction(s).  */
+      int len = TYPE_LENGTH (type);
+      int regno = ARM_A1_REGNUM;
+      bfd_byte tmpbuf[INT_REGISTER_SIZE];
+
+      while (len > 0)
+	{
+	  regcache_cooked_read (regs, regno++, tmpbuf);
+	  memcpy (valbuf, tmpbuf,
+		  len > INT_REGISTER_SIZE ? INT_REGISTER_SIZE : len);
+	  len -= INT_REGISTER_SIZE;
+	  valbuf += INT_REGISTER_SIZE;
+	}
+    }
+}
+
+
+/* Will a function return an aggregate type in memory or in a
+   register?  Return 0 if an aggregate type can be returned in a
+   register, 1 if it must be returned in memory.  */
+
+static int
+arm_return_in_memory (struct gdbarch *gdbarch, struct type *type)
+{
+  int nRc;
+  enum type_code code;
+
+  CHECK_TYPEDEF (type);
+
+  /* In the ARM ABI, "integer" like aggregate types are returned in
+     registers.  For an aggregate type to be integer like, its size
+     must be less than or equal to INT_REGISTER_SIZE and the
+     offset of each addressable subfield must be zero.  Note that bit
+     fields are not addressable, and all addressable subfields of
+     unions always start at offset zero.
+
+     This function is based on the behaviour of GCC 2.95.1.
+     See: gcc/arm.c: arm_return_in_memory() for details.
+
+     Note: All versions of GCC before GCC 2.95.2 do not set up the
+     parameters correctly for a function returning the following
+     structure: struct { float f;}; This should be returned in memory,
+     not a register.  Richard Earnshaw sent me a patch, but I do not
+     know of any way to detect if a function like the above has been
+     compiled with the correct calling convention.  */
+
+  /* All aggregate types that won't fit in a register must be returned
+     in memory.  */
+  if (TYPE_LENGTH (type) > INT_REGISTER_SIZE)
+    {
+      return 1;
+    }
+
+  /* The AAPCS says all aggregates not larger than a word are returned
+     in a register.  */
+  if (gdbarch_tdep (gdbarch)->arm_abi != ARM_ABI_APCS)
+    return 0;
+
+  /* The only aggregate types that can be returned in a register are
+     structs and unions.  Arrays must be returned in memory.  */
+  code = TYPE_CODE (type);
+  if ((TYPE_CODE_STRUCT != code) && (TYPE_CODE_UNION != code))
+    {
+      return 1;
+    }
+
+  /* Assume all other aggregate types can be returned in a register.
+     Run a check for structures, unions and arrays.  */
+  nRc = 0;
+
+  if ((TYPE_CODE_STRUCT == code) || (TYPE_CODE_UNION == code))
+    {
+      int i;
+      /* Need to check if this struct/union is "integer" like.  For
+         this to be true, its size must be less than or equal to
+         INT_REGISTER_SIZE and the offset of each addressable
+         subfield must be zero.  Note that bit fields are not
+         addressable, and unions always start at offset zero.  If any
+         of the subfields is a floating point type, the struct/union
+         cannot be an integer type.  */
+
+      /* For each field in the object, check:
+         1) Is it FP? --> yes, nRc = 1;
+         2) Is it addressable (bitpos != 0) and
+         not packed (bitsize == 0)?
+         --> yes, nRc = 1
+       */
+
+      for (i = 0; i < TYPE_NFIELDS (type); i++)
+	{
+	  enum type_code field_type_code;
+	  field_type_code = TYPE_CODE (check_typedef (TYPE_FIELD_TYPE (type, i)));
+
+	  /* Is it a floating point type field?  */
 	  if (field_type_code == TYPE_CODE_FLT)
 	    {
 	      nRc = 1;
@@ -3252,6 +5076,11 @@ arm_gdbarch_init (struct gdbarch_info in
   /* On ARM targets char defaults to unsigned.  */
   set_gdbarch_char_signed (gdbarch, 0);
 
+  /* Note: for displaced stepping, this includes the breakpoint, and one word
+     of additional scratch space.  This setting isn't used for anything beside
+     displaced stepping at present.  */
+  set_gdbarch_max_insn_length (gdbarch, 4 * DISPLACED_MODIFIED_INSNS);
+
   /* This should be low enough for everything.  */
   tdep->lowest_pc = 0x20;
   tdep->jb_pc = -1;	/* Longjump support not enabled by default.  */
--- .pc/displaced-stepping/gdb/arm-tdep.h	2009-07-15 11:14:33.000000000 -0700
+++ gdb/arm-tdep.h	2009-07-15 11:15:02.000000000 -0700
@@ -172,11 +172,110 @@ struct gdbarch_tdep
   struct regset *gregset, *fpregset;
 };
 
+/* Structures used for displaced stepping.  */
+
+/* The maximum number of temporaries available for displaced instructions.  */
+#define DISPLACED_TEMPS			16
+/* The maximum number of modified instructions generated for one single-stepped
+   instruction, including the breakpoint (usually at the end of the instruction
+   sequence) and any scratch words, etc.  */
+#define DISPLACED_MODIFIED_INSNS	8
+
+struct displaced_step_closure
+{
+  ULONGEST tmp[DISPLACED_TEMPS];
+  int rd;
+  int wrote_to_pc;
+  union
+  {
+    struct
+    {
+      int xfersize;
+      int rn;			   /* Writeback register.  */
+      unsigned int immed : 1;      /* Offset is immediate.  */
+      unsigned int writeback : 1;  /* Perform base-register writeback.  */
+      unsigned int restore_r4 : 1; /* Used r4 as scratch.  */
+    } ldst;
+
+    struct
+    {
+      unsigned long dest;
+      unsigned int link : 1;
+      unsigned int exchange : 1;
+      unsigned int cond : 4;
+    } branch;
+
+    struct
+    {
+      unsigned int regmask;
+      int rn;
+      CORE_ADDR xfer_addr;
+      unsigned int load : 1;
+      unsigned int user : 1;
+      unsigned int increment : 1;
+      unsigned int before : 1;
+      unsigned int writeback : 1;
+      unsigned int cond : 4;
+    } block;
+
+    struct
+    {
+      unsigned int immed : 1;
+    } preload;
+
+    struct
+    {
+      /* If non-NULL, override generic SVC handling (e.g. for a particular
+         OS).  */
+      int (*copy_svc_os) (uint32_t insn, CORE_ADDR to, struct regcache *regs,
+			  struct displaced_step_closure *dsc);
+    } svc;
+  } u;
+  unsigned long modinsn[DISPLACED_MODIFIED_INSNS];
+  int numinsns;
+  CORE_ADDR insn_addr;
+  CORE_ADDR scratch_base;
+  void (*cleanup) (struct regcache *, struct displaced_step_closure *);
+};
+
+/* Values for the WRITE_PC argument to displaced_write_reg.  If the register
+   write may write to the PC, specifies the way the CPSR T bit, etc. is
+   modified by the instruction.  */
+
+enum pc_write_style
+{
+  BRANCH_WRITE_PC,
+  BX_WRITE_PC,
+  LOAD_WRITE_PC,
+  ALU_WRITE_PC,
+  CANNOT_WRITE_PC
+};
+
+extern void
+  arm_process_displaced_insn (uint32_t insn, CORE_ADDR from, CORE_ADDR to,
+			      struct regcache *regs,
+			      struct displaced_step_closure *dsc);
+extern void
+  arm_displaced_init_closure (struct gdbarch *gdbarch, CORE_ADDR from,
+			      CORE_ADDR to, struct displaced_step_closure *dsc);
+extern ULONGEST
+  displaced_read_reg (struct regcache *regs, CORE_ADDR from, int regno);
+extern void
+  displaced_write_reg (struct regcache *regs,
+		       struct displaced_step_closure *dsc, int regno,
+		       ULONGEST val, enum pc_write_style write_pc);
 
 CORE_ADDR arm_skip_stub (struct frame_info *, CORE_ADDR);
 CORE_ADDR arm_get_next_pc (struct frame_info *, CORE_ADDR);
 int arm_software_single_step (struct frame_info *);
 
+extern struct displaced_step_closure *
+  arm_displaced_step_copy_insn (struct gdbarch *, CORE_ADDR, CORE_ADDR,
+				struct regcache *);
+extern void arm_displaced_step_fixup (struct gdbarch *,
+				      struct displaced_step_closure *,
+				      CORE_ADDR, CORE_ADDR, struct regcache *);
+
 /* Functions exported from armbsd-tdep.h.  */
 
 /* Return the appropriate register set for the core section identified
--- .pc/displaced-stepping-always/gdb/infrun.c	2009-07-15 00:36:51.000000000 -0700
+++ gdb/infrun.c	2009-07-15 11:16:42.000000000 -0700
@@ -825,6 +825,9 @@ displaced_step_fixup (ptid_t event_ptid,
      one now.  */
   while (displaced_step_request_queue)
     {
+      struct regcache *regcache;
+      struct gdbarch *gdbarch;
+
       struct displaced_step_request *head;
       ptid_t ptid;
       CORE_ADDR actual_pc;
@@ -847,8 +850,12 @@ displaced_step_fixup (ptid_t event_ptid,
 
 	  displaced_step_prepare (ptid);
 
+	  regcache = get_thread_regcache (ptid);
+	  gdbarch = get_regcache_arch (regcache);
+
 	  if (debug_displaced)
 	    {
+	      CORE_ADDR actual_pc = regcache_read_pc (regcache);
 	      gdb_byte buf[4];
 
 	      fprintf_unfiltered (gdb_stdlog, "displaced: run 0x%s: ",
@@ -857,7 +864,10 @@ displaced_step_fixup (ptid_t event_ptid,
 	      displaced_step_dump_bytes (gdb_stdlog, buf, sizeof (buf));
 	    }
 
-	  target_resume (ptid, 1, TARGET_SIGNAL_0);
+	  if (gdbarch_software_single_step_p (gdbarch))
+	    target_resume (ptid, 0, TARGET_SIGNAL_0);
+	  else
+	    target_resume (ptid, 1, TARGET_SIGNAL_0);
 
 	  /* Done, we're stepping a thread.  */
 	  break;
@@ -961,15 +971,19 @@ maybe_software_singlestep (struct gdbarc
 {
   int hw_step = 1;
 
-  if (gdbarch_software_single_step_p (gdbarch)
-      && gdbarch_software_single_step (gdbarch, get_current_frame ()))
+  if (gdbarch_software_single_step_p (gdbarch))
     {
-      hw_step = 0;
-      /* Do not pull these breakpoints until after a `wait' in
-	 `wait_for_inferior' */
-      singlestep_breakpoints_inserted_p = 1;
-      singlestep_ptid = inferior_ptid;
-      singlestep_pc = pc;
+      if (use_displaced_stepping (gdbarch))
+        hw_step = 0;
+      else if (gdbarch_software_single_step (gdbarch, get_current_frame ()))
+	{
+	  hw_step = 0;
+	  /* Do not pull these breakpoints until after a `wait' in
+	     `wait_for_inferior' */
+	  singlestep_breakpoints_inserted_p = 1;
+	  singlestep_ptid = inferior_ptid;
+	  singlestep_pc = pc;
+	}
     }
   return hw_step;
 }
@@ -1037,7 +1051,8 @@ a command like `return' or `jump' to con
      comments in the handle_inferior event for dealing with 'random
      signals' explain what we do instead.  */
   if (use_displaced_stepping (gdbarch)
-      && tp->trap_expected
+      && (tp->trap_expected
+	  || (step && gdbarch_software_single_step_p (gdbarch)))
       && sig == TARGET_SIGNAL_0)
     {
       if (!displaced_step_prepare (inferior_ptid))

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]