This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Boot-time probing with SystemTap


This past week I have played around with bootchart
(http://www.bootchart.org/), which is a set of linux scripts designed
to help locate the reasons for slow Linux machine startup. It makes it
easy for a novice user to collect startup information on their machine and
generate a plot of the data to give a timeline of processes, CPU/disk
usage. Its plusses are:

- very easy to setup
- nice graphics show where to look for CPU and IO hogs

Bootchart is not ideal; it does have its drawback and blind
spots. Bootchart makes use of the existing /proc information. It shows
which processes spawn other processes. However it it is only shows
half of the picture for the processes; it doesn't show what event
caused a process to continue, e.g. a process wait4. What caused a
process to be stopped and restarted would be useful for finding
critical paths in the code.

Bootchart prunes short-lived tasks from the graph. Thus, "death by a
thousand cuts", a script that spawns a many short-lived tasks that add
up to a significant amount of time might not be obvious from the
generated graphes. Bootchart has an option that eliminates the
pruning, but the charts generated with that option have a huge number
of processes on them. There should be a better way to summarize that.

Using the lessons from bootchart there can be some things done to make
SystemTap provide information to quickly focus attention to problem
areas (BZ#2035).  Want systemtap boot up probing as easy to use as
bootchart.  Key requirements are:

- Simplify the SystemTap boot probe install steps to a simple command line.
- Have the startup show the SystemTap bootprobe option as grub entry
- Have method that automatically shuts down the probe:
	-user defined script/function test function called when probe started
		-e.g. stop data collection when particular process starts
	-when script/function returns kill probe
- Have some scripts that demo the data collection
	-trace which files opened, #reads/writes, amount of data
		look for which processes or opening same file repeatedly
	-trace fork, exec, exit, wait4, sleep
	-have a format that is easy to parse (LKET format?)
- Have scripts:
	-that make a hit list of what to focus on from collected data
	-that generates graphs summarizing problem


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]