This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

beaverton meeting minutes


Hi -

Here are my set of rambly minutes from last week's group meetings,
hosted by IBM Beaverton.  Thanks, guys!  This was our second
large-scale face-to-face meeting, so it could be called "2f2f" if
you're into that kind of thing.


2005-11-15

09:00
- meeting startup w/ Jim at the lovely offices of IBM Beaverton (ex Sequent)
- attendees
  Red Hat:
    Martin Hunt, Will Cohen, Elena Zannoni, Frank Eigler, Graydon Hoare
  Hitachi:
    Yumiko Sugita, Satoshi Oshima, Hideo Aoki (?)
  IBM:
    Kevin Stafford, Hien Nguyen, Jim Keniston, Larry Kessler, Vara Prasad,
    (by phone:) Ananth Mavinakayan, Prasanna Panchamukhi
  Intel:
    Josh Stone, Anil Keshavamurthy, Brad Chen

09:10
- jkenisto on kprobes
- kprobes, jprobes, retprobes on 386, x86-64, ia64, ppc64
- no one working on sparc64
- ezannoni: ia64 status?  jkenisto: upstream
- ananth: hugemem-vs-gdb kprobes fixes in queue, bz# 171980
- ananth suspects translator at fault in pr# 1836
- jkenisto reviewing kprobes bugs:
  - 1345/1808: kretprobes, jkenisto etc. still thinking about cure
  - 1813: RCU may fix, have no RH kernel experience; ananth to check
    in with RH kernel guys to check on RCU patch inclusion
  - 1235: blacklist; kprobes-resident blacklisting already upstream, but
    not RH kernels, possibly redundant with translator-side blacklist; 
    potential backport will miss RHEL4U3 deadline
  - RCU patches in U3-candidate kernel, yey; ezannoni to find way to
    distribute this to partners
  - printk not generally safe: kernel-claimed safety not sufficient to us
  - suggest closing 1235, and shove it into translator directly
  - 1776: systemtap probes crash; raw kprobes better; details later
  - 1303: wishlist item for probe handler crash detection
  - re kretprobes stack traceback hygiene
- kprobes future:
  - userspace prototyped
  - userspace return not started
  - "safe" user->kernel data copying still an open issue
  - sysrq emergency disarm key: how bulletproof?  from pessimal
    circumstances?  old code sets just a global flag for deferred
    disarming; IBM experience indicates recovery from death-throes
    very unlikely; hopeless if interrupts disabled anyway
  - need testsuite

10:30
- fche gives talk on translator internals

11:10
- graydon talks about stats implementation
- new syntax meets RAVE reviews, people dancing in the streets for some reason
- some code checked in just before meeting, just shy of code generation

11:35
- brad.chen on checking etc.
- used binary rewriting tool "pin"
- tool had problems ... a pinhead, surely?
- new demo based on objdump feeding a perl script
- usefulness iffy as is
- but would be nice to see a list of kernel symbols used from probe context,
  to detect printk-like problems
- dtrace safety beyond us by virtue of their restrictions (static probe 
  points); add analysis task to brad.chen bug #901
- maybe kprobes become guru-only if enough tapsets / static probes come in
- vara recalls old kernel-resident "tapset" concept from spring
- perhaps as means to provide kernel-side kprobe-candidacy whitelist
- q: how to get kernel developers to want to help us instrument their stuff
- issue: distribution/maintenance of tapsets - when kernel version drifts,
  who should keep the scripts up-to-date?
- varap: kernel developers want to help, wish instrumentation to live in
  kernel source repository
- supply macros for kernel developers to mark up instrument-worthy places
- cost question: how close to zero must dormant probes be to be acceptable?
- to pass data, it'll have to be non-zero cost

11:30
- lunch
- lovely sandwiches, thanks!!
- bug review
- bug 1594: possible kprobes-systemtap adaptation function - kprobe error
- RH compiler bug 169485 still outstanding (gcc4 backporting to gcc3.4)
- bug 1802: use -D MAXINSTANCES
- need %ifarch for e.g. system calls
- 907: would like $userptr->field sort of thing; jkenisto suggests => for
  a single hop
- $target->field rework; bring back ${ptr->field.subfield}
- test 907 on RHEL4U2; claimed to work on 2.6.14
- could translator synthesize jprobes?

13:40
- applications
- implement usbmon in systemtap?
- jfs instrumentation
- block layer tapset - Jens Axboe expressed interest
- iostat
- iotop "based on systemtap" - Red Flag Linux
- hitachi raises issue of binary block/data passing; they have some
  very high performance probing needed, <1us?
- grayche imagines binary tracing into circular mmap'd buffer
- "porting" dtrace providers

14:00
- wcohen on testing
- review of existing buckets, test types
- bug 1808, still out there
- RHTS lagging behind systemtap development due to beehive rpm insuckage
- varap et al. will start regular testing of RHEL4 vs systemtap snapshots
- wcohen commits to dejagnuizing kprobes tests
- need arch-specific tests, code coverage analysis
- need volunteers for stress-tests
- decision: team will focus on RCU kprobes instead of classic kprobes

14:30
- systemtap demo by Hien to IBM group
- Mingming Cao (IBM ext3 contributor) excited even by iostats.stp
- probing during kernel boot time w/ statically loaded modules
- translator/runtime should perform NUMA memory allocation at load time
- fault injection interesting ($var or $retvalue writing)
- multiplatform interesting
- IBM's official favorite distro is: <no comment>
- caching systemtap modules could cut down compilation time
- code patching interesting, but how would systemtap be useful?
- remote/boot-time probe module injection; static linking of module,
  operation w/o stpd

15:40
- djprobes demo
- overhead target: probing at 30 kHz with 1% overhead; 300 ns/probe
- handler must save/restore register state
- copied instructions must be PC-independent / relocatable
- cultural cross-pollination: japanese "foo bar" == "hoge huga"
- gettimeofday benchmark demo: djprobes 40ns, kprobes 520ns (overheads)
- gnuplot formatted output is neat0
- they wrote several beautiful tapsets to replicate 10% of lkst trace points
- technique portable to x86-64, problematic for ppc64 (ToC reg?), ok
  for modern ia64 (with atomic-store-16)
- kprobe on top of djprobe latter-day-bytes
- coexistence with kdb
- safety check for jmp+address insertion tricky with preemption/hw interrupts
- eligibility check for PC address tricky; current demos all use function
  entrypc, simplifying the situation
- hitachi needs to work out actual algorithm for general safety checking

18:30
- group dinner at big "road house" style restaruant with wonderful
  post and beam construction 
- thanks Larry!

2005-11-16

09:00
- regrouping, sugary snacks just a few minutes late
- reverting to bug review, wish list, AR
- review of safety: brad.chen to bring over a static checker widget,
  no new major efforts until users suggest specific areas
- graydon forsees translator-resident whitelist for letting us flip
  switch of general dwarf probes to guru mode
- make $val rvalues guru-mode only at that time; no apparent hazard to 
  keeping them safe-mode visible at the present
- jkenisto reminds kprobe originally oriented toward expert users
- varap focus on kernel community engagement
- varap reminds of desire for user-space probes; couple of months of work
  left
- kretprobes bug 1345 patch submitted
- RHEL5 kernel patch deadlines next spring, impacting IBM despite its
  quasi-fictional placement
- jkenisto summarizes user-space probe prototypes
  - inode+offset basis, as per dprobes
- script/java targeting pushed by sun
  - static instrumentation - not necessarily a dwarf/kprobes thing
  - static instrumentation user-space probes could expand to hard-coded
    int3 in systemtap shlib
  - desire compatibility with instrumentation inserted for dtrace' sake
  - RH can investigate this part
- watchpoint probes
  - hw support exists in multiple architectures
  - problem: arbitration/sharing of hw control registers 
  - send info to roland as feature request for ptrace-rewrite 

10:00
- evangelism, Garrett @ IBM joining us
- <gh@us.ibm.com>
- <customer> CTO had sun/linux competition
- we need more publicity
- need more problem state sharing
- there exist some neat kernel-space problems
- systemtap results could be used as evidence to kernel hackers to justify
  fixes, if customer proprietary workloads were not shareable to demonstrate
  the problems
- put tool into hands of pre-sales technical types, to diagnose customers'
  solaris-to-linux porting problems
- IBM has "captured customers"; more understood problem space, relationship,
  less need to dazzle them with linux
- IBM kas kernel subsystem experts, offers constructing tapsets
- but: too many individual problems for general problem patterns
- lkml flame du jour: memory fragmentation
- need lkml problem monitoring, offering up systemtap solutions, awareness
- what obstacles exist for wannabe users?
- probe cross-compilation necessary (#1145)
- integrate elfutils builds into cvs/src systemtap build procedure; open-coding
  the rpm bundling logic; include "portability patch"

11:00
- perhaps write IBM Red Book about systemtap
- need "standard demo" of systemtap up on web
- RHEL5: virtualization: xen kprobes
- user-level kretprobes may involve blocking
- djprobes "booster" for kprobes/kretprobes patches coming from Hitachi
- need algorithm to check for safe insertion point
- consensus on not requiring Hitachi to port widget to other architectures
- djprobes eligibility safety checking kernel or more likely 
  user-space computable
- minimizing api sprawl good

11:30
- action item review [follows in separate email]
- manpower commitment [deferred]
- adjourned


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]