This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: double fault


On Mon, 2005-11-21 at 17:12 -0800, Stone, Joshua I wrote: 
> I am seeing sporadic double-faults when running tests on systemtap.  I
> am trying to run systemtap.base/lt.exp, though others fail as well.  It
> doesn't always fail, but if I run it four or five times in succession
> that's usually enough to trigger the fault.  Below are manual copies of
> a couple of the faults dumped to the console:

Sorry I didn't respond sooner. I've been a bit slow the last couple days
due to the flu.

This looks like the same double-fault I've been seeing sporadically on
my laptop running RHEL4 (and nowhere else).  I tried a couple of ways to
track it down but it isn't easy.  I never did get my laptop working with
netdump either.

It appeared to me that the faults were originating in kprobes. In fact
the same OS on the same hardware with the scalability patches does not
have this problem.

I stripped down the generated C file to something very small that still
demonstrated the problem. Basically it has the giant context array and a
sets a single kprobe on sys_open that simply returns.

Changing the kprobe to other functions does not always trigger the bug.

The problem also has something to do with the size of the context array.
Changing NR_CPUS to 128 (which makes the array really huge) was enough
to cause the double fault to happen on all my RHEL machines (including
x86_64) except for ones running under vmware. I changed the code to use
vmalloc (we really want vmalloc_node() but RHEL4 doesn't have it) and
all the crashes stopped on every machine.

Confused yet? I've attached my simple C file that triggers the bug. But
I'm not sure its worth pursuing further because it appears to not happen
in the newer version of kprobes.

Martin


#define MAXNESTING 30
#define MAXSTRINGLEN 128
#define STP_STRING_SIZE MAXSTRINGLEN
#include "runtime.h"
#include <linux/string.h>
#include <linux/timer.h>
#include "loc2c-runtime.h" 
typedef char string_t[MAXSTRINGLEN];

struct context {
  atomic_t busy;
  const char *probe_point;
  unsigned actioncount;
  unsigned nesting;
  const char *last_error;
  const char *last_stmt;
  struct pt_regs *regs;
  union {
    struct probe_0_locals {
    } probe_0;
    struct function_my_sys_open_mode_str_locals {
      string_t bs;
      int64_t f;
      string_t __tmp0;
      string_t __tmp1;
      string_t __tmp2;
      string_t __tmp3;
      string_t __tmp4;
      string_t __tmp5;
      string_t __tmp6;
      string_t __tmp7;
      string_t __tmp8;
      string_t __tmp9;
      string_t __tmp10;
      string_t __tmp11;
      string_t __tmp12;
      string_t __tmp13;
      string_t __tmp14;
      string_t __tmp15;
      string_t __tmp16;
      string_t __tmp17;
      string_t __tmp18;
      string_t __tmp19;
      string_t __tmp20;
      string_t __tmp21;
      string_t __tmp22;
      string_t __tmp23;
      string_t __tmp24;
      string_t __tmp25;
      string_t __tmp26;
      string_t __tmp27;
      string_t __tmp28;
      string_t __tmp29;
      string_t __tmp30;
      string_t __tmp31;
      string_t __tmp32;
      string_t __tmp33;
      string_t __tmp34;
      string_t __tmp35;
      string_t __retvalue;
    } function_my_sys_open_mode_str;
  } locals [MAXNESTING];
} contexts [128];


static struct kprobe dwarf_kprobe_0[1]= {
  {.addr= (void *) 0xc016765e}
};

char const * dwarf_kprobe_0_location_names[1] = {
  "kernel.function(\"sys_open@fs/open.c:947\")"
};

static int 
dwarf_kprobe_0_enter (struct kprobe *probe_instance, struct pt_regs *regs) {
  return 0;
}

static int systemtap_module_init (void);
int systemtap_module_init () {
  int rc = 0;
  const char *probe_point = "";
  /* register probe #0, 1 location(s) */
  probe_point = "kernel.function(\"sys_open@fs/open.c:947\")";
  {
    int i;
    printk("in module_init() contexts = %d\n", sizeof(contexts));
    for (i = 0; i < 1; i++) {
    ssleep(5);    
      dwarf_kprobe_0[i].pre_handler = &dwarf_kprobe_0_enter;
      rc = rc || register_kprobe (&(dwarf_kprobe_0[i]));
      if (unlikely (rc)) {
        probe_point = dwarf_kprobe_0_location_names[i];
        break;
      }
    printk("probe registered\n");
    }
    if (unlikely (rc)) while (--i >= 0)
      unregister_kprobe (&(dwarf_kprobe_0[i]));
  }
  
  printk("DONE rc=%d\n", rc);
  ssleep(5);
  return rc;
}

void systemtap_module_exit (void) {
  int i;
  for (i = 0; i < 1; i++)
    unregister_kprobe (&(dwarf_kprobe_0[i]));
}

int probe_start () {
  return systemtap_module_init () ? -1 : 0;
}

void probe_exit () {
  systemtap_module_exit ();
}

MODULE_DESCRIPTION("systemtap probe");
MODULE_LICENSE("GPL");

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]