This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Boot-time probing with SystemTap
- From: William Cohen <wcohen at redhat dot com>
- To: SystemTAP <systemtap at sources dot redhat dot com>
- Date: Thu, 18 Jan 2007 16:24:54 -0500
- Subject: Boot-time probing with SystemTap
This past week I have played around with bootchart
(http://www.bootchart.org/), which is a set of linux scripts designed
to help locate the reasons for slow Linux machine startup. It makes it
easy for a novice user to collect startup information on their machine and
generate a plot of the data to give a timeline of processes, CPU/disk
usage. Its plusses are:
- very easy to setup
- nice graphics show where to look for CPU and IO hogs
Bootchart is not ideal; it does have its drawback and blind
spots. Bootchart makes use of the existing /proc information. It shows
which processes spawn other processes. However it it is only shows
half of the picture for the processes; it doesn't show what event
caused a process to continue, e.g. a process wait4. What caused a
process to be stopped and restarted would be useful for finding
critical paths in the code.
Bootchart prunes short-lived tasks from the graph. Thus, "death by a
thousand cuts", a script that spawns a many short-lived tasks that add
up to a significant amount of time might not be obvious from the
generated graphes. Bootchart has an option that eliminates the
pruning, but the charts generated with that option have a huge number
of processes on them. There should be a better way to summarize that.
Using the lessons from bootchart there can be some things done to make
SystemTap provide information to quickly focus attention to problem
areas (BZ#2035). Want systemtap boot up probing as easy to use as
bootchart. Key requirements are:
- Simplify the SystemTap boot probe install steps to a simple command line.
- Have the startup show the SystemTap bootprobe option as grub entry
- Have method that automatically shuts down the probe:
-user defined script/function test function called when probe started
-e.g. stop data collection when particular process starts
-when script/function returns kill probe
- Have some scripts that demo the data collection
-trace which files opened, #reads/writes, amount of data
look for which processes or opening same file repeatedly
-trace fork, exec, exit, wait4, sleep
-have a format that is easy to parse (LKET format?)
- Have scripts:
-that make a hit list of what to focus on from collected data
-that generates graphs summarizing problem