This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Systemtap translator support for hw breakpoints on x86

From: Prerna Saxena <prerna at linux dot vnet dot ibm dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: systemtap at sourceware dot org
Date: Fri, 08 Jan 2010 20:05:02 +0530
Subject: Re: [RFC] Systemtap translator support for hw breakpoints on x86
References: <4B459CC8.2030402@linux.vnet.ibm.com> <y0m8wc9eqxg.fsf@fche.csb>

Hi Frank,

Thanks for taking time to look. I'll respond to suggestions on my patch here. I'm sending a separate email to address valuable suggestions from you/ Roland on how hwbkpts could be utilized better by systemtap.

On 01/07/2010 11:30 PM, Frank Ch. Eigler wrote:

Prerna Saxena<prerna@linux.vnet.ibm.com> writes:

[...]
This patch enables systemtap translator to probe hardware breakpoints
on x86 and x86_64 (based on mainline kernel).

Looks like a good start.

In order to merge this, it is essential to have a testsuite, posted
results on a variety of architectures and kernel versions (kfail for
older kernels), and documentation.

This patch was meant to be an RFC. I'll add all this when this gets checked into the git tree.

Syntax :
probe kernel.data(ADDRESS).write
probe kernel.data(ADDRESS).rw
probe kernel.data(ADDRESS).length(LEN).write
probe kernel.data(ADDRESS).length(LEN).rw
probe kernel.data("SYMBOL_NAME").write
probe kernel.data("SYMBOL_NAME").rw


(I wonder if simple .read or .write would be sufficient as suffixes,
and let a script render both as

probe kernel.data(ADDRESS).write, probe kernel.data(ADDRESS).read

and let the translator collapse such duplication into the internal RW case.)

There are issues here, as Roland pointed out. I'll explain in my next note.

+  // Warn of misconfigured kernels
+  s.op->newline()<<  "#ifndef CONFIG_HAVE_HW_BREAKPOINT";
+  s.op->newline()<<  "#error \"Need CONFIG_HAVE_HW_BREAKPOINT!\"";
+  s.op->newline()<<  "#endif";


Instead or in addition, the translator could make the whole probe point
type conditional on session.kernel_config("CONFIG_HAVE_HW_BREAKPOINT").

I'll add this in addition to the pass 4 check. Will this check take care of cross compilation for a different kernel ? (I think it should)

+// Define macros for access type
+  s.op->newline()<<  "#define HWBKPT_READ 0";
+  s.op->newline()<<  "#define HWBKPT_WRITE 1";
+  s.op->newline()<<  "#define HWBKPT_RW 2";


(Are these necessary?  Could the translator not emit symbols like
HW_BREAKPOINT_W directly?)

This had to be defined because some types are not defined on all architectures. X86 supports only HW_BREAKPOINT_W and HW_BREAKPOINT_RW, while ppc supports HW_BREAKPOINT_W and HW_BREAKPOINT_R. So, while a request to probe only reads to a location is acceptable on ppc, it cannot be employed on x86 for now (details in my next email ). So, I had to use a common mechanism to include all the 3 cases.

+  s.op->newline()<<  "static struct stap_hwbkpt_probe {";
+  s.op->newline()<<  "int registered_p:1;";
+// registered_p = -1 signifies an invalid probe that must not be registered
+// registered_p =  0 signifies a probe that failed registration
+// registered_p =  1 signifies a probe that got registered successfully

A one-bit value can't represent three different values.

Wonder how such a blunder was missed. I'll fix this. :-)

+      if (p->symbol_name.size())
+      s.op->line()<<  " .address=(unsigned long)0x0"<<  "ULL,";

(You probably don't need explicit initialization to zero.)

+  s.op->newline()<<  "static int enter_hwbkpt_probe (struct perf_event *bp,";
[...]
+  s.op->newline()<<  "for ( i=0; i<"<<  hwbkpt_probes_vector.size()<<  "; i++) {";
+  s.op->newline()<<  "struct perf_event_attr *hp =&  stap_hwbkpt_probe_array[i];";
+  s.op->newline()<<  "if (bp->attr.bp_addr==hp->bp_addr&&  bp->attr.bp_type==hp->bp_type&&  bp->attr.bp_len==hp->bp_len ) {";
[...]

To what extent does this handle multiple hw breakpoints on the same address?

For now, the kernel API doesnt allow chaining handlers for identical hardware breakpoints. The kernel API registers each hardware breakpoint as a separate case, whether its for identical or different addresses. So I had to include a crude check to map a hwbkpt perf_event (returned by the API when a breakpoint is hit) to the perf_event attribute for which it was registered.

[...]
+void
+hwbkpt_derived_probe_group::emit_module_init (systemtap_session&  s)
+{ [...]
+  s.op->newline()<<  "const char *hwbkpt_symbol_name = addr ? NULL : sdp->symbol;";
+  s.op->newline()<<  "	hw_breakpoint_init(hp);";


Normally we use newline(1) or (-1) to indent instead of explicit
spaces like this.

I'll remember that, thanks :)

+  s.op->newline()<<  "	if (addr) ";
+  s.op->newline()<<  "	    hp->bp_addr = (unsigned long) addr;";
+  s.op->newline()<<  "	else ";
+  s.op->newline()<<  "	   hp->bp_addr = kallsyms_lookup_name(hwbkpt_symbol_name);";


Is kallsyms_lookup_name actually module-exported these days?  Should the
translator try to resolve symbols at translate time?  Wildcards?

I just checked ! kallsyms_lookup_name is exported in mainline kernel via commit f60d24d2ad04977b0bd9e3eb35dba2d2fa569af9 ... diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 8b6b8b6..8e5288a 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -181,6 +181,7 @@ unsigned long kallsyms_lookup_name(const char *name) } return module_kallsyms_lookup_name(name); } +EXPORT_SYMBOL_GPL(kallsyms_lookup_name); ...

+  s.op->newline()<<  "#ifdef CONFIG_X86";
+  s.op->newline()<<  "  switch(sdp->len) {";
+  s.op->newline()<<  "	case HW_BREAKPOINT_LEN_1:";
+  s.op->newline()<<  "    hp->bp_len = HW_BREAKPOINT_LEN_1;";
[...]

Is this sort of switch really necessary? Why not just set

hp->bp_len = sdp->len;

I'll trim this.

+  s.op->newline()<<  "	default:";
+  s.op->newline()<<  "  	_stp_warn(\"Probe %s registration skipped : unsupported length. Supported lengths for x86 are 1,2,4 {8 for x86_84 only} \",sdp->pp);";
+  s.op->newline()<<  "		sdp->registered_p = -1;";
+  s.op->newline()<<  "		stap_hwbkpt_ret_array[i] = ERR_PTR(-EINVAL);";

Why is this necessary ...

For now, I had designed the probes to error out if a user defined a length that was not supported by architecture (eg a length 3 on x86, as this supports lengths like 1,2,4 bytes only) However, I think your suggestion ( to let the translator override this with the closest length that the arch supports) is nice. I'll try implementing it.

+  s.op->newline()<<  "    case HWBKPT_READ:";
+  s.op->newline()<<  "  	_stp_warn(\"Probe %s registration skipped : READ-ONLY hardware breakpoints not supported on x86\",sdp->pp);";
+  s.op->newline()<<  "		sdp->registered_p = -1;";
+  s.op->newline()<<  "		break;";

... and this too ...

+  s.op->newline()<<  "if (sdp->registered_p>= 0) {";
+  s.op->newline()<<  " stap_hwbkpt_ret_array[i] = register_wide_hw_breakpoint( hp, (void *)&enter_hwbkpt_probe );";


... considering that register_wide_hw_breakpoint() would surely
check its inputs and reject requests deemed inappropriate for
the current architecture?

As explained earlier, there is no HW_BREAKPOINT_R flag defined for x86. I cannot define a read-only hardware breakpoint for x86 directly using the hw breakpoint api. The only way to do it using two debug registers, and setting up 2 separate breakpoints for the same address -- one with an RW flag, and the other with a write flag. A read would happen when exactly one of these is hit and the other is not.

Note also that registered_p always>= 0 for your actual int:1 data
type.  Consider simply == 0 for unregistered, == 1 for registered.

I'll fix this.

+  s.op->newline()<<  " if (IS_ERR(stap_hwbkpt_ret_array[i])) {";
+  s.op->newline()<<  "     rc = PTR_ERR(stap_hwbkpt_ret_array[i]);";
+  s.op->newline()<<  "      sdp->registered_p = 0;";
+  s.op->newline(1)<<  "	    _stp_warn(\" Hwbkpt Probe %s : Registration error rc= %d, Addr = %p, name = %s\", probe_point, rc, addr, hwbkpt_symbol_name);";
+  s.op->newline(-1)<<  "  }";
+  s.op->newline()<<  "    else sdp->registered_p = 1;";
+  s.op->newline()<<  "}";
+  s.op->newline(-1)<<  "}"; // for loop


Not so simple: if you get a failure on the sixth registration, you have to
unregister the prior ones.

There is a glitch here. As Roland pointed out, debug registers are a scarce resource (1 on PPC, 4 on x86 ) I can modify the translator to throw an error in a script that tries to register more breakpoints than what that architecture allows. Eg, a script will not compile if it tries to probe 2 hw breakpoints on a ppc system, which can just have one breakpoint at a time.

However, in case less resources are available at runtime (say, only 3 out of 4 debug registers are free on x86 when the stap-generated module is running), and the script has 4 hardware breakpoints -- the first 3 will get registered while _only_ the 4th request will fail at registration. I thought this was an acceptable behaviour. Pls let me know your views.

- FChE

Regards,

--
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

Follow-Ups:
- Re: [RFC] Systemtap translator support for hw breakpoints on x86
  - From: Frank Ch. Eigler

References:
- [RFC] Systemtap translator support for hardware breakpoints on
  - From: Prerna Saxena
- Re: [RFC] Systemtap translator support for hardware breakpoints on
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]