This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: performance of multithreading gets gradually worse under gdb


On Wed, 02 Feb 2011 13:43:50 -0800, Michael Snyder wrote:

It allocates about 100kB per iteration.

Hmmm, and that's for roughly 36 thread start/stops, so it could be
losing roughly 3000 bytes per thread. That's much bigger than my
first guess would have been (a "struct thread_info" is only 336 bytes).


One interesting finding might also be:
I terminated the process when an iteration took about 3 min (instead of 1 sec)
and gdb had about 115MB allocated.

I assume that at this point, your system still had plenty of ram to spare? It wasn't simply swapping?

The system had about 20 GB to spare.


On starting the application again, it ran alright for a while and the gdb memory allocation
stayed constant. When it finally started to grow again, the application slowed down, and became
slower with every iteration - the usual picture.
I attached a sample file from the application where the computation bifurcates into
the worker threads. This is one of three instances per iteration, but they all follow the
same pattern.

I was really hoping for a stripped-down sample that we could compile and run.

See the attached file. It shows a similar behaviour, although it only allocates 8kB per iteration.
You have to wait some time before this happens.


The machine has 2x6 cores x 3 instances per iteration = 36 worker threads per iteration.

x86 architecture?


On another note, I tried to compile gdb-6.5 on my machine (because it was the release I
used to work with before, without problems) and configure comes back with an error that it cannot find a termcap lib. There is none on the SuSE. Which package would I need to install?

That would be libncurses, I think.


I compiled gdb-6.5 alright and it performs well as usual, without this problem.




On Wed, 02 Feb 2011 12:27:58 -0800, Michael Snyder wrote:
Markus Alber wrote:
Hello,
I have experienced the following problem:
I'm debugging a number-crunching application which spawns a lot (36) little
worker threads per iteration. The system does typically OoM 200 iterations.
Although each of them should take about the same amount of time, the performance
gets worse with every iteration and becomes excruciatingly slow.
A system monitor reveals that gdb allocates more memory with every iteration,
i.e. with every 36 threads started and finished. The CPU load of GDB goes up, too.
The CPU usage of the application goes down. Compared to the solo performance, it
gets slower by a factor 20 and more, if run long enough.
The application behaves perfectly when run by itself. The multi-threaded part is not
debugging compiled when this behaviour occurs.
The distribution is SuSE 11.3 / gdb 7.1.
Is there anything I can change about this behaviour, any options of gdb that need to
be set in these circumstances?
Interesting.

By how much does gdb's memory allocation increase?
In total or, if possible, per iteration?  This might
give is a clue as to where to look.

Do you think you could write a simple sample program that
allocates threads in a manner similar to your application?

Thanks,
#include <stdlib.h>
#include <iostream>
#include <math.h>

#include <vector>

#include <boost/thread/thread.hpp>
#include <boost/bind.hpp>

using namespace std;

// Compile with: g++ -g -lboost_thread mt_test.cpp

struct ThreadData {
  ThreadData(const vector<double>& vOne,
	     const double factor) :
    m_vOne(vOne),
    m_factor(factor),
    m_vTwo(vOne.size(), m_factor)
    {
      // empty
    }
  
  const vector<double>&   m_vOne;
  double                  m_factor;
  vector<double>          m_vTwo;
};
  
static void* doAdd(void* data)
{
  ThreadData* pData = static_cast<ThreadData*>(data);

  for(unsigned int nAt = 0; nAt < pData->m_vOne.size(); ++nAt)
    pData->m_vTwo[nAt] += pData->m_factor * pData->m_vOne[nAt]; 

  return NULL;
}


int main(void) {

  const int nThreads = 12;
  const int nRepetitions = 200;
  
  vector<double> vOne(1<<24, 1.0);
  
  for(int nAtRepetition = 0; nAtRepetition < nRepetitions; ++nAtRepetition) {

    cout << "Repetition no. " << nAtRepetition << endl;

    vector<ThreadData*> vpData(nThreads, static_cast<ThreadData*>(NULL));  
    boost::thread_group threads;
    
    for (int nAtThread = 0; nAtThread < nThreads; nAtThread++) {
      vpData[nAtThread] = new ThreadData(vOne, static_cast<double>(nAtThread+1) );
      if( threads.create_thread( boost::bind(&doAdd, static_cast<void*> (vpData[nAtThread])) ) == 0 )
   	throw std::exception();
    }
    
    threads.join_all();
    
    for (int nAtThread = 0; nAtThread < nThreads; nAtThread++) 
      delete vpData[nAtThread];
  }

  return 0;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]