This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Enhancements to block IO & IO scheduler tapset


Hi,

Reworked the patch per suggestions..

On 10/16/2009 08:56 PM, Frank Ch. Eigler wrote:
Prerna Saxena<prerna@linux.vnet.ibm.com> writes:

Here is a patch to add tracepoint-based probes to block IO and IO
scheduler tapsets.

Thank you.


I had to add new probe aliases instead of adding fallbacks to
existing probes because the tracepoint that flags an event, say ,
elv_add_request, is not defined at function entry but somewhere in
the interior. [...]

We don't have to have an exact match. Likely a function-entry or function-return probe would do, filtered with predicates that skip inapplicable calls.

For example, your elv_add_request = kernel.trace("block_rq_insert")
seems to match recent kernel.function("elv_insert").

I reworked this probe to have kprobe based fallback. Unfortunately, this was the only place where it was feasible.


trace("block_rq_abort") ~= kernel.function("blk_start_request"),
perhaps with some filtering (cmd_flags | REQ_QUIET?).


In this case, a kprobe based fallback is not feasible because the same function is called from multiple places, and a tracepoint is placed just before a singular instance of this call. So a kprobe based probe in this place will generate far more number of calls than the tracepoint hit.
Its a similar situation for most other tracepoint based probes -- it is unfortunate that most probes cannot have kprobe fallbacks.



Also, the handlers for a lot of probe aliases are duplicated -- they
essentially expose the same set of local variables available in the
probe. Is there some way I could reduce duplication here ? [...]

probe ioscheduler_trace.plug = kernel.trace("block_plug"),
probe ioscheduler_trace.unplug_io = kernel.trace("block_unplug_io")
{
..do something..
}

We don't have a general preprocessor for such stuff. Among the possibilities:

- extending our own little preprocessor
   - so something like %( foo = token token token %) to define,
     %( foo %) to use

- running cpp
   - but that's bad because it'd mess up embedded-C code like #includes/#ifdefs

- running m4
   - possible; could run m4 on tapset/**/*.stp.m4 at systemtap build time
     or at run time
   - could make contextual script errors tricky
   - would not mess with embedded-C
   - m4_define(`foo',`token token token') to define,
     foo   to use
   - could export CONFIG_*, kernel_v, etc. variables for m4 conditionals

- doing nothing
   - so cut&  paste as today


I've just replicated handlers for now !



- FChE

I've added testcases and man page updates. Looking fwd to feedback..


Regards,
--
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India
Index: stap-git-Oct-01/tapset/ioblock.stp
===================================================================
--- stap-git-Oct-01.orig/tapset/ioblock.stp
+++ stap-git-Oct-01/tapset/ioblock.stp
@@ -145,7 +145,7 @@ probe ioblock.request = kernel.function 
  * Variables:
  *  devname   - block device name
  *  ino       - i-node number of the mapped file
- *  byte_done - number of bytes transferred 
+ *  bytes_done - number of bytes transferred
  *  sector    - beginning sector for the entire bio
  *  flags     - see below
  *      BIO_UPTODATE    0       ok after I/O completion
@@ -186,3 +186,161 @@ probe ioblock.end = kernel.function("bio
 %)
         size = $bio->bi_size
 }
+
+/**
+ * probe ioblock_trace.bounce
+ *
+ *	tracepoint based probe.
+ * 	Fires whenever a buffer bounce is needed for at least one page of
+ *	a block IO request.
+ *
+ * Context :
+ *	The process creating a block IO request.
+ *
+ * Variables :
+ *  $bio       struct bio *
+ *  $q         struct request_queue*
+ *  devname    device for which a buffer bounce was needed.
+ *  ino       - i-node number of the mapped file
+ *  bytes_done - number of bytes transferred
+ *  sector    - beginning sector for the entire bio
+ *  flags     - see below
+ *      BIO_UPTODATE    0       ok after I/O completion
+ *      BIO_RW_BLOCK    1       RW_AHEAD set, and read/write would block
+ *      BIO_EOF         2       out-out-bounds error
+ *      BIO_SEG_VALID   3       nr_hw_seg valid
+ *      BIO_CLONED      4       doesn't own data
+ *      BIO_BOUNCED     5       bio is a bounce bio
+ *      BIO_USER_MAPPED 6       contains user pages
+ *      BIO_EOPNOTSUPP  7       not supported
+ *  error     - 0 on success
+ *  rw        - binary trace for read/write request
+ *  vcnt      - bio vector count which represents number of array element (page,
+ *              offset, length) which makes up this I/O request
+ *  idx       - offset into the bio vector array
+ *  phys_segments - number of segments in this bio after physical address
+ *                  coalescing is performed.
+ *  size      - total size in bytes
+ */
+probe ioblock_trace.bounce = kernel.trace("block_bio_bounce")
+{
+        devname = __bio_devname($bio)
+        ino = __bio_ino($bio)
+
+        bytes_done = $bio->bi_size
+        sector = $bio->bi_sector
+        flags = $bio->bi_flags
+        rw = $bio->bi_rw
+        vcnt = $bio->bi_vcnt
+        idx = $bio->bi_idx
+        phys_segments = $bio->bi_phys_segments
+        size = $bio->bi_size
+}
+
+/* probe ioblock_trace.request
+ *
+ *  tracepoint based probe.
+ *  Fires just as a generic block I/O request is created for a bio.
+ *
+ * Context:
+ *  The process makes block I/O request
+ *
+ * Variables:
+ *  $bio       struct bio* for which IO request is to be submitted
+ *  $q         struct request_queue* to which the request is to be added
+ *  devname   - block device name
+ *  ino       - i-node number of the mapped file
+ *  sector    - beginning sector for the entire bio
+ *  flags     - see below
+ *	BIO_UPTODATE    0       ok after I/O completion
+ *	BIO_RW_BLOCK    1       RW_AHEAD set, and read/write would block
+ *	BIO_EOF         2       out-out-bounds error
+ *	BIO_SEG_VALID   3       nr_hw_seg valid
+ *	BIO_CLONED      4       doesn't own data
+ *	BIO_BOUNCED     5       bio is a bounce bio
+ *	BIO_USER_MAPPED 6       contains user pages
+ *	BIO_EOPNOTSUPP  7       not supported
+ *
+ *  rw        - binary trace for read/write request
+ *  vcnt      - bio vector count which represents number of array element (page,
+ *              offset, length) which make up this I/O request
+ *  idx       - offset into the bio vector array
+ *  phys_segments - number of segments in this bio after physical address
+ *                  coalescing is performed.
+ *  size      - total size in bytes
+ *  bdev      - target block device
+ *  bdev_contains - points to the device object which contains the
+ *                  partition (when bio structure represents a partition)
+ *  p_start_sect -  points to the start sector of the partition
+ *                  structure of the device
+ */
+
+probe ioblock_trace.request = kernel.trace("block_bio_queue")
+{
+        devname = __bio_devname($bio)
+        ino = __bio_ino($bio)
+
+        bytes_done = $bio->bi_size
+        error = $error
+        sector = $bio->bi_sector
+        flags = $bio->bi_flags
+        rw = $bio->bi_rw
+        vcnt = $bio->bi_vcnt
+        idx = $bio->bi_idx
+        phys_segments = $bio->bi_phys_segments
+        size = $bio->bi_size
+        bdev_contains = $bio->bi_bdev->bd_contains
+        bdev = $bio->bi_bdev
+        p_start_sect = __bio_start_sect($bio)
+}
+
+/* probe ioblock_trace.end
+ *
+ *  Tracepoint based probe.
+ *  Fires whenever a block I/O transfer is complete.
+ *
+ * Context:
+ *  The process signals the transfer is done.
+ *
+ * Variables:
+ *  q	      - request queue on which this bio was queued.
+ *  devname   - block device name
+ *  ino       - i-node number of the mapped file
+ *  bytes_done - number of bytes transferred
+ *  sector    - beginning sector for the entire bio
+ *  flags     - see below
+ *      BIO_UPTODATE    0       ok after I/O completion
+ *      BIO_RW_BLOCK    1       RW_AHEAD set, and read/write would block
+ *      BIO_EOF         2       out-out-bounds error
+ *      BIO_SEG_VALID   3       nr_hw_seg valid
+ *      BIO_CLONED      4       doesn't own data
+ *      BIO_BOUNCED     5       bio is a bounce bio
+ *      BIO_USER_MAPPED 6       contains user pages
+ *      BIO_EOPNOTSUPP  7       not supported
+
+ *  error     - 0 on success
+ *  rw        - binary trace for read/write request
+ *  vcnt      - bio vector count which represents number of array element (page,
+ *              offset, length) which makes up this I/O request
+ *  idx       - offset into the bio vector array
+ *  phys_segments - number of segments in this bio after physical address
+ *                  coalescing is performed.
+ *  size      - total size in bytes
+ */
+probe ioblock_trace.end = kernel.trace("block_bio_complete")
+{
+	q = $q
+        devname = __bio_devname($bio)
+        ino = __bio_ino($bio)
+
+        bytes_done = $bio->bi_size
+        error = $error
+
+        sector = $bio->bi_sector
+        flags = $bio->bi_flags
+        rw = $bio->bi_rw
+        vcnt = $bio->bi_vcnt
+        idx = $bio->bi_idx
+        phys_segments = $bio->bi_phys_segments
+        size = $bio->bi_size
+}
Index: stap-git-Oct-01/tapset/ioscheduler.stp
===================================================================
--- stap-git-Oct-01.orig/tapset/ioscheduler.stp
+++ stap-git-Oct-01/tapset/ioscheduler.stp
@@ -68,22 +68,24 @@ probe ioscheduler.elv_next_request.retur
 }
 
 /**
- * probe ioscheduler.elv_add_request - A request was added to the request queue
+ * probe ioscheduler.elv_add_request.kp - kprobe based probe to indicate that a request was added to the request queue
  * @elevator_name: The type of I/O elevator currently enabled
+ * @q: pointer to request queue
  * @req: Address of the request
  * @req_flags: Request flags
  * @disk_major: Disk major number of the request
  * @disk_minor: Disk minor number of the request
  */
 // when a request is added to the request queue
-probe ioscheduler.elv_add_request
-	= kernel.function("__elv_add_request")
+probe ioscheduler.elv_add_request.kp
+	= kernel.function("elv_insert")
 {
 %( kernel_v >= "2.6.10" %?
 	elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
 %:
 	elevator_name = kernel_string($q->elevator->elevator_name)
 %)
+	q = $q
 	if($rq == 0)  {
 		disk_major = -1
 		disk_minor = -1
@@ -142,6 +144,189 @@ probe ioscheduler.elv_completed_request
 %)
 }
 
+/**
+ * probe ioscheduler.elv_add_request.tp : tracepoint based probe to indicate a request is added to the request queue.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @q			: Pointer to request queue.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler.elv_add_request.tp
+			= kernel.trace("block_rq_insert")
+{
+q =  $q
+elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
+rq = $rq
+
+if ($rq == 0 || $rq->rq_disk ==0) {
+	disk_major = -1
+	disk_minor = -1
+} else {
+		disk_major = $rq->rq_disk->major
+		disk_minor = $rq->rq_disk->first_minor
+}
+
+rq_flags = $rq==0? 0:$rq->cmd_flags
+}
+
+/**
+ * probe ioscheduler.elv_add_request : probe to indicate request is added to the request queue.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @q			: Pointer to request queue.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler.elv_add_request =
+	ioscheduler.elv_add_request.tp !, ioscheduler.elv_add_request.kp
+{}
+
+/**
+ * probe ioscheduler_trace.elv_completed_request : Fires when a request is
+ *							completed.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler_trace.elv_completed_request
+			= kernel.trace("block_rq_complete")
+{
+elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
+rq = $rq
+
+if ($rq == 0 || $rq->rq_disk ==0) {
+	disk_major = -1
+	disk_minor = -1
+} else {
+		disk_major = $rq->rq_disk->major
+		disk_minor = $rq->rq_disk->first_minor
+}
+
+rq_flags = $rq==0? 0:$rq->cmd_flags
+}
+
+/**
+ * probe ioscheduler_trace.elv_issue_request : Fires when a request is
+ *							scheduled.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler_trace.elv_issue_request
+			= kernel.trace("block_rq_issue")
+{
+elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
+rq = $rq
+
+if ($rq == 0 || $rq->rq_disk ==0) {
+	disk_major = -1
+	disk_minor = -1
+} else {
+		disk_major = $rq->rq_disk->major
+		disk_minor = $rq->rq_disk->first_minor
+}
+
+rq_flags = $rq==0? 0:$rq->cmd_flags
+}
+
+/**
+ * probe ioscheduler_trace.elv_requeue_request : Fires when a request is
+ *	put back on the queue, when the hadware cannot accept more requests.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler_trace.elv_requeue_request
+			= kernel.trace("block_rq_requeue")
+{
+elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
+rq = $rq
+
+if ($rq == 0 || $rq->rq_disk ==0) {
+	disk_major = -1
+	disk_minor = -1
+} else {
+		disk_major = $rq->rq_disk->major
+		disk_minor = $rq->rq_disk->first_minor
+}
+
+rq_flags = $rq==0? 0:$rq->cmd_flags
+}
+
+/**
+ * probe ioscheduler_trace.elv_abort_request : Fires when a request is aborted.
+ * @elevator_name	: The type of I/O elevator currently enabled.
+ * @rq			: Address of request.
+ * @rq_flags		: Request flags.
+ * @disk_major		: Disk major no of request.
+ * @disk_minor		: Disk minor number of request.
+ *
+ */
+probe ioscheduler_trace.elv_abort_request
+			= kernel.trace("block_rq_abort")
+{
+elevator_name = kernel_string($q->elevator->elevator_type->elevator_name)
+rq = $rq
+
+if ($rq == 0 || $rq->rq_disk ==0) {
+	disk_major = -1
+	disk_minor = -1
+} else {
+		disk_major = $rq->rq_disk->major
+		disk_minor = $rq->rq_disk->first_minor
+}
+
+rq_flags = $rq==0? 0:$rq->cmd_flags
+}
+
+/**
+ * probe ioscheduler_trace.plug - Fires when a request queue is plugged;
+ *		ie, requests in the queue cannot be serviced by block driver.
+ * @rq_queue	: request queue
+ *
+ */
+probe ioscheduler_trace.plug = kernel.trace("block_plug")
+{
+  rq_queue = $q
+}
+
+/**
+ * probe ioscheduler_trace.unplug_io - Fires when a request queue is unplugged;
+ *	Either, when number of pending requests in the queue exceeds threshold
+ *	or, upon expiration of timer that was activated when queue was plugged.
+ * @rq_queue	: request queue
+ *
+ */
+probe ioscheduler_trace.unplug_io = kernel.trace("block_unplug_io")
+{
+  rq_queue = $q
+}
+
+/**
+ * probe ioscheduler_trace.unplug_timer - Fires when unplug timer associated
+ *					with a request queue expires.
+ * @rq_queue	: request queue
+ *
+ */
+probe ioscheduler_trace.unplug_timer = kernel.trace("block_unplug_timer")
+{
+  rq_queue = $q
+}
+
 function disk_major_from_request:long(var_q:long)
 %{ /* pure */
 	struct request_queue *q = (struct request_queue *)((long)THIS->var_q);
Index: stap-git-Oct-01/man/stapprobes.iosched.3stap.in
===================================================================
--- stap-git-Oct-01.orig/man/stapprobes.iosched.3stap.in
+++ stap-git-Oct-01/man/stapprobes.iosched.3stap.in
@@ -54,6 +54,7 @@ Fires when return from retrieving a requ
 .TP 
 .B ioscheduler.elv_add_request
 Fires when add a request into request queue
+Resolves to a kprobe based fallback if the tracepoint probe isnt available.
 
 .B Arguments:
 
@@ -94,6 +95,124 @@ Fires when a request is completed
 .I disk_minor
   disk minor number of the request
 
+.P
+.TP
+.B ioscheduler_trace.elv_completed_request
+Tracepoint based probe that fires when a request is completed
+
+.B Arguments:
+
+.I elevator_name
+  The elevator name
+
+.I req
+  Address of the request
+
+.I req_flags
+  request flags
+
+.I disk_major
+  disk major number of the request
+
+.I disk_minor
+  disk minor number of the request
+
+.P
+.TP
+.B ioscheduler_trace.elv_issue_request
+Tracepoint based probe that fires when a request is scheduled.
+
+.B Arguments:
+
+.I elevator_name
+  The elevator name
+
+.I req
+  Address of the request
+
+.I req_flags
+  request flags
+
+.I disk_major
+  disk major number of the request
+
+.I disk_minor
+  disk minor number of the request
+
+.P
+.TP
+.B ioscheduler_trace.elv_requeue_request
+Tracepoint based probe that fires when a request is put back on the queue, cos the hardware cannot accept more requests.
+
+.B Arguments:
+
+.I elevator_name
+  The elevator name
+
+.I req
+  Address of the request
+
+.I req_flags
+  request flags
+
+.I disk_major
+  disk major number of the request
+
+.I disk_minor
+  disk minor number of the request
+
+.P
+.TP
+.B ioscheduler_trace.elv_abort_request
+Tracepoint based probe that fires when a request is aborted.
+
+.B Arguments:
+
+.I elevator_name
+  The elevator name
+
+.I req
+  Address of the request
+
+.I req_flags
+  request flags
+
+.I disk_major
+  disk major number of the request
+
+.I disk_minor
+  disk minor number of the request
+
+.P
+.TP
+.B ioscheduler_trace.plug
+Tracepoint based probe that fires when a request queue is plugged.
+
+.B Arguments:
+
+.I req_queue
+  Pointer to the request queue.
+
+.P
+.TP
+.B ioscheduler_trace.unplug_io
+Tracepoint based probe that fires when a request queue is unplugged.
+
+.B Arguments:
+
+.I req_queue
+  Pointer to the request queue.
+
+.P
+.TP
+.B ioscheduler_trace.unplug_timer
+Tracepoint based probe that fires when unplug timer associated with a request queue expires.
+
+.B Arguments:
+
+.I req_queue
+  Pointer to the request queue.
+
 .SH SEE ALSO
 .IR stap (1),
 .IR stapprobes (3stap)
Index: stap-git-Oct-01/testsuite/buildok/ioscheduler.stp
===================================================================
--- stap-git-Oct-01.orig/testsuite/buildok/ioscheduler.stp
+++ stap-git-Oct-01/testsuite/buildok/ioscheduler.stp
@@ -7,3 +7,16 @@ probe ioscheduler.*
 	printf("ppname: %s, elv_name: %s, %d, %d", probefunc(), 
 		elevator_name, disk_major, disk_minor)
 }
+
+probe ioscheduler_trace.elv*
+{
+        printf("ppname: %s, request %p, elv_name: %s, %d, %d", probefunc(),
+                rq, elevator_name, disk_major, disk_minor)
+}
+
+probe ioscheduler_trace.plug, ioscheduler_trace.unplug_io, ioscheduler_trace.unplug_timer
+{
+        printf("ppname: %s, request %p, elv_name: %s, %d, %d", probefunc(),
+                rq_queue)
+}
+
Index: stap-git-Oct-01/testsuite/buildok/ioblock_test.stp
===================================================================
--- stap-git-Oct-01.orig/testsuite/buildok/ioblock_test.stp
+++ stap-git-Oct-01/testsuite/buildok/ioblock_test.stp
@@ -25,3 +25,10 @@ probe ioblock.end {
                 devname, sector, flags, rw, bio_rw_str(rw), vcnt, idx, phys_segments, size, bytes_done, error, ino)
 %)
 }
+
+probe ioblock_trace.*
+{
+  log(pp())
+  printf("%s\t%p\t%d\t%d\t%d\t%s\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
+                devname, q, sector, flags, rw, bio_rw_str(rw), vcnt, idx, phys_segments, size, bytes_done, error, ino)
+}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]