This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: double fault -> PAGE_KERNEL flagged memory


I suspect that your double fault may come from the systemTAP logging code. Do
you have an instrumentation point in any fault handler ?

For Tom : can you flag the RelayFS buffer memory PAGE_KERNEL instead of
GFP_KERNEL ? Otherwise, it leads to page faults when accessing those pages when
accessed for the first time (seen with LTTng).

For instance, if you log an event for the page fault handler, and this logging
code does generate a page fault itself, then you get a double fault.

The same could apply to unaligned memory access.

Make sure that the SystemTAP code is _always_ in contiguous memory non
swappable to disk :

The Linux kernel module loading does make sure that all module code is memory
locked (see module.c) by first loading the whole module in a vmap area (which
is swappable) and then copying the code in a region of memory flagged
PAGE_KERNEL_EXEC (see vmalloc.c:vmalloc_exec()).

Furthermore, make sure that each data memory regions are also non swappable.
That means the RelayFS buffer too.

So :

- memory in which the SystemTAP code is loaded should be allocated with
  vmalloc_exec() (or with the GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC
  flags).
- SystemTAP global data structures should be in memory protected from swap out,
  with a flag like PAGE_KERNEL.
- RelayFS buffers should be PAGE_KERNEL too (not GFP_KERNEL).


Mathieu


* Stone, Joshua I (joshua.i.stone@intel.com) wrote:
> I am seeing sporadic double-faults when running tests on systemtap.  I
> am trying to run systemtap.base/lt.exp, though others fail as well.  It
> doesn't always fail, but if I run it four or five times in succession
> that's usually enough to trigger the fault.  Below are manual copies of
> a couple of the faults dumped to the console:
> 
> double fault, gdt at c0358000 [255 bytes]
> double fault, tss at c03dc000
> eip = ffffffff, esp = f4b6500c
> eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018
> esi = ffffffff, edi = ffffffff, ebp = 00000000
>  
> double fault, gdt at c0358000 [255 bytes]
> double fault, tss at c03dc000
> eip = c011a799, esp = f5bd4f98
> eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c
> esi = 00000000, edi = c011a785, ebp = 00000000
> 
> The first dump doesn't tell much, but the edi and eip values in the
> second dump are interesting.  'c011a785' is the beginning of
> do_page_fault, and the instruction at 'c011a799' is a read from the
> stack.  Methinks the stack runneth over?
> 
> This is on RHEL4 U2, i686, kernel 2.6.9-22.EL.  I verified this crash on
> two different machines with this kernel: an IBM T42 laptop (1.7GHz
> Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM).
> I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel.  I also
> tried the desktop in x86_64 mode, and could not reproduce the problem
> with the UP kernel nor the SMP kernel.
> 
> Please let me know if there's any other information I can provide to
> help track this down...
> 
> Thanks,
> 
> Josh Stone
> 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]