This is the mail archive of the archer@sourceware.org mailing list for the Archer project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

FYI GDB on-disk .debug cache (mmapcache) [Re: Tasks]

From: Jan Kratochvil <jan dot kratochvil at redhat dot com>
To: Tom Tromey <tromey at redhat dot com>
Cc: Project Archer <archer at sourceware dot org>
Date: Thu, 31 Jul 2008 17:32:36 +0200
Subject: FYI GDB on-disk .debug cache (mmapcache) [Re: Tasks]
References: <m363qtg0ey.fsf@fleche.redhat.com> <m3r69c7aa7.fsf@fleche.redhat.com>

On Tue, 29 Jul 2008 21:36:16 +0200, Tom Tromey wrote:
> * Scalability and shared libraries.  Profile some examples.  Figure
>   out what is wrong.  Jan has some information here.

As promised on #tools giving some info what I have found.  I am talking
specifically only about the loading of the .debug (separate debug info files)
from disk (I do not know about other runtime performance problems).

vvv background info vvv
-----------------------
.symtab: ELF part - Generated by `gcc' (but not with `gcc -s').
  `readelf -s' output'.
  68: 08048b14    61 FUNC    GLOBAL DEFAULT   13 main
.debug_info: DWARF - Generated by `gcc -g'.
  `readelf --debug-dump' output.
<1><105>: Abbrev Number: 2 (DW_TAG_subprogram)
  <106>   DW_AT_external    : 1       
  <107>   DW_AT_name        : main    

struct minimal_symbol *msym;
  Represents .symtab.
  (This part is not of a concern here, just do not mix it with psymtab.)

struct partial_symtab *psymtab;
  Represents scanned .debug_info - only pointers to the content to save GDB
  runtime memory.  Each CU (Compilation Unit) has such one partial_symtab.

struct symtab *symtab;
  Represents loaded .debug_info - already loaded content in memory.

DWARF is now read in two phases - first psymtab is read and then the specific
CUs which we need along have loaded their symtab.

psymtab is now read by dwarf2_build_psymtabs_hard() and it works by
scanning/indexing .debug_info.

dwarf2_build_psymtabs_easy() should have been using the index sections like
`.debug_pubnames' but GDB has it #if 0-ed.  IIRC GDB currently builds its
psymtab entries even for `static' variables across files while the C semantics
says these are not visible.  `.debug_pubnames' correctly does not contain the
`static' variables making dwarf2_build_psymtabs_easy() either impossible or
changing the current GDB behavior.  IMO GDB should require `print
filename.c:varname' if `varname' is static and `filename.c' is not the file
from current PC.  Another possibility (I do not like) is to change GCC to
produce static variables for `.debug_pubnames', I find the DWARF standard
unclear in this regards.  The patch does not try to use the DWARF indexes as
it was intended to not changing any behavior.

symtab is read by load_full_comp_unit()/process_full_comp_unit() for existing
psymtab.

Daniel J. did suggest that psymtab+symtab should be encapsulated for the GDB
core and left as an implementation detail for the debuginfo backend (DWARF).

GDB has two options:
  --readnow          Fully read symbol files on first access.
    It builds symtabs immediately, not just on-demand.  It still reads
    .debug_info in two phases, it would be faster to build it all at once but
    I do not find an easy change of the current codebase for it.

  --readnever        Do not read symbol files.
    Never touch .debug_info, do not build even psymtabs.
-----------------------
^^^ background info ^^^

The question is what we try to target.  With .debug files in the kernel disk
cache the current reading is pretty fast:
  4.0s for Firefox and 27s (OK, I admit that is bad) for full OOo.
Another test is with the disk caches flushed:
  sync; echo 3 > /proc/sys/vm/drop_caches
which takes 26s for Firefox now.  I was trying to target even the latter case
as it has no practical meaning but it gives a bad user experience.

My idea was some on-disk cache, created on-demand in ~/.gdb_cache . There
could also be some system-wide rpm debuginfo packages cache created in
/usr/lib/debug during rpm %post by `gdb --readnow'.

There already existed something betweeng gdb-4.5 and gdb-5.0 both incl. in
src/mmalloc/ . This allocator associated the allocations with an `objfile->md'
pointer essential to mark all the objfile-related memory objects for
mapping/unmapping from the cache file.  With the default system memory
randomizer one fortunately finds very easy a memory pointer belonging to
objfile but left not associated (points to nowhere later).  One also easily
finds SEGV on a needlessly recreated memory object in the objfile by mapping
the stored cache readonly (without PROT_WRITE).

There was a custom allocator which may be worth to revive in src/mmalloc/ .
I found easier to temporarily write one of mine mmapcache-alloc.c which is
clearly suboptimal and should be replaced.  The primary goal is to reduce any
memory random accesses - disk seeks - expensive across the whole area - which
fails for my allocator (while it tries to be optimal to the memory it uses).

I originally planned it will be easy just to hook on the obstack allocator
which is being mostly used for the DWARF reading.  This is wrong.  A lot of
DWARF allocations are now done through regular xmalloc.  A lot of obstacks are
created only temporarily and later freed (thus we need an allocator with
effective free() memory reusal and it gets inefficient anyway).  Some pointers
in the memory areas associated to objfile cannot be associated to the same
objfile (for example as they come from bfd).  Hooking on xmalloc/xfree is also
not easy as a memory malloc()ed by readline is sometimes xfree()d etc.

The patch tried to at least get the cache working with no crashes with
a possibility to incrementally optimize it later - therefore catch all
xmalloc+obstacks but forgive any mistaken memory<->objfile calls - just losing
some performance on these misses.  This goal was reached but the performance
went 26s->28s.  Some later optimizations here and there dropped it to 21s
where I found it should not be done by the backward compatible mode but rather
a specific updates of all the involved allocations (to associate it with their
objfile).

With my patch the CPU usage while loading the stored cache file dropped from
~60% to about 3% but still it was ~26s and with disk seeking all the time.
Each mistaken operation (needless object reallocation updating a pointer in
the objfile area or just access to rebuild a memory object which could be
whole stored already in the cache etc.) was very expensive just due to the
disk access - not due to CPU. gprof was useless so I used great Ulrich's
  http://people.redhat.com/drepper/pagein.html
I downgraded to valgrind-3.2.0, I did not find the pagein update easy enough.

bfd/ can be built with --with-mmap but I did not find a noticeable difference.

I tried to make all the possibly failed operations non-fatal as the whole goal
is just an acceleration.

We need to reserve a memory area for the objfile being loaded, later
a contiguous extension of the memory area may not be possible.  Contiguous to
get the best performance on the load of the cache file.  Size of the reserved
area is unpredictable(?) without scanning .debug_info so I did guess it (it
costs only kernel pagetable entries / virtual address space, not memory).

Loaded minimal/partial/full symbols contain `struct general_symbol_info'
containing `asection *' which comes (uncacheable?) from bfd.  (Hopefully
solved in the patch.)

Currently dwarf2read.c allocates memory and reads the debug sections into it.
I tried to mmap it instead but it makes no change.  Also i tried to store just
the (offset,size) pairs and later to mmap the specific areas from the original
files (and not the cache file) - it also makes no change.  The CPU power is so
cheap and also the disk read bandwidth is cheap, just the disk seek is the
culprit.  binutils now support compressed sections - I ignored it so far but
it may change what is worth mmapping.

obstack now wastes the end of its allocated chunks.  It does not try to
realloc() the memory (growing would not work with move, shrinking would be
useful). I tried to put an exception into obstack for better performance.

It will be very hard to create the mapped cache readonly as all the parts of
GDB freely write to the objects belonging to the objfile data structures.  It
may be worth to just PROT_WRITE map it but with MAP_PRIVATE. Just it would not
work for continuous updating (reading of CUs on-demand) of the cache file.
What if two GDBs run with the same cache file?  IMO some transactional access
is not a reality.  Force --readnow when we are creating a cache file?  But it
makes the user experience terrible how slowly it starts than w/o the cache.
I hope to: PROT_WRITE|MAP_SHARED first GDB, ignore the cache file by second
GDB, first GDB will not use --readnow, it will continously update the cache.
It will modify it somehow which will get written while quitting (see below).

Found out the cache close/write/flush slow, at least implemented a forked-off
background flush on a GDB close.  All the little updates scattered across the
areas mean to write always all the cache files to the disk on munmap().

I find my mmapcache.c useful (but not so mmapcache-alloc.c).

What I was talking about:
  http://people.redhat.com/jkratoch/mmapcache-wrong.patch
  (just diffed it, have not tried it now)

Valgrind pagein patch to make it usable (display the caller backtraces):
  http://people.redhat.com/jkratoch/valgrind-pagein-1.0-backtrace.patch

Regards,
Jan

Follow-Ups:
- Re: FYI GDB on-disk .debug cache (mmapcache) [Re: Tasks]
  - From: Tom Tromey
- Re: FYI GDB on-disk .debug cache (mmapcache) [Re: Tasks]
  - From: Tom Tromey

References:
- Tasks
  - From: Tom Tromey
- Re: Tasks
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]