This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Newbie] Help request: understanding slowdowns in a network file system


Hi all,
I'm trying to understand what is causing occasional slowdowns to disks
I/O in a virtual environment I manage.

The disks are stored as image files on a Gluster
(http://www.gluster.org/) FUSE filesystem, and each image file is
stored on two different Gluster servers.  This means that any disk
request from an application on a virtual server goes through something
similar to the following layers:

(1) Linux VFS on guest system
(2) Hypervisor on host system
(3) Linux VFS on host system
(4) Gluster client FUSE module on host system
(5) Network layer on host system
(6) Physical network
(7) Network layer on Gluster server system
(8) Gluster server FUSE module on Gluster server system
(9) Linux VFS on Gluster server system
(10) Filesystem code on Gluster server system
(11) Physical disk on Gluster server system

The question I need to answer is "what do I need to upgrade to fix
this problem" and I've not been able to find an answer using the usual
troubleshooting tools - I've not even been able to find anything other
than observed behaviour on the guest system

I'm reading the Systemtap Beginners Guide which has some examples
which will help at certain layers (e.g. iotime.stp) but I'm struggling
to understand how to pull everything together to get helpful
diagnostic information.

The questions I have are:
1) Is Systemtap the right tool to help me get to the bottom of this
problem?  If not, the rest of the questions don't matter...
2) As an administrator rather than a developer I don't really know
which system calls I need to be monitoring.  What is the best way to
work this out?
3) Is there a neat way to tie together requests going out of the
client with requests coming into the server?
4) Are there any hints anyone can give on the best way to approach
troubleshooting across several different processes, layers and
services like this?

Thanks,
Dan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]