This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Checking function calls

From: Fredrik Tolf <fredrik at dolda2000 dot cjb dot net>
To: Michael Elizabeth Chastain <mec at shout dot net>
Cc: gdb at sources dot redhat dot com
Date: 06 Dec 2002 17:24:39 +0100
Subject: Re: Checking function calls
References: <200212052240.gB5Mefm16249@duracef.shout.net>

On Thu, 2002-12-05 at 23:40, Michael Elizabeth Chastain wrote:
> Hi Fredrik,
> 
> > The only libraries are libc, libpthread, libdl and libpam. In the
> > affected function, only libc and libpthread are used.
> 
> What operating environment are you running on?  If it is a Linux
> platform, and gcc is the only compiler anywhere in sight,
> then it's likely not an ABI clash.  If it's non-Linux Unix,
> this becomes slightly likely.  If it is Cygwin/Windows then
> it's a common gotcha.
> 

It is a GNU/Linux platform, and, yes, I am using gcc.

> > That would, of course, be a good thing. It's only that I'd have to do
> > that after every single function call... That would take some time.
> > Maybe I'll do it, anyway.
> 
> Yes.

I have added checks where I compare the current value of next to a saved
buffer after every function call now. I am currently testing with it.

> 
> mec> You could also try forcing your variable to be on the stack instead of a
> mec> register.  Remove the "register" attribute from the declaration of "next"
> mec> if you have one.  Then add a "do_nothing(&next)" call to your function,
> mec> to force "next" to be on the stack instead of in a register.  If the
> mec> symptoms go away then it's more likely to really be a register clobber.
> 
> > That just doesn't feel like a very elegant solution, though.
> 
> Oh, it's not meant to be a solution, it's meant to be a diagnostic tool
> to help figure out the problem.
> 

True, of course. I just don't really understand where it would lead.

> > But next isn't stored in memory at any place, so it cannot be that.
> 
> 'next' is initialized from a memory location though, and you have no
> check that it is valid when it is first initialized.  Actually that
> would be a good check to add.
> 

That's true of course. I have, however, already added such checks
recently

> > If the list was to be made unstable by a buggy function somewhere, it
> > would have to restored again by the same function (since it's always
> > consistent when I look at it), and I just don't see that happening.
> 
> Mmmm, that is not true!
> 
> Let us stare at your source code a bit:
> 
>   /* 1 */ for(cur = list; cur != NULL; cur = next)
>   /* 2 */ {
>   /* 3 */      if((next = cur->next) != NULL)
>   /* 4 */	  pthread_mutex_lock(&next->mutex);
>   /* 5 */      ... /* next is not mentioned anymore */
>   /* 6 */ }
> 
> Suppose that you have two threads, T1 and T2, and three blocks
> on the list, B0, B1, B2.
> 
>   T1 executes [1], "cur = list", so "cur" holds the address of B0.
>   T1 executes [3], "next = cur->next", so "next" holds the address of B1.
>   T2 is scheduled -- and T1 is holding no mutexes!

Sorry that I didn't mention it, but just above the loop, I actually do
have

if(list != NULL)
    pthread_mutex_lock(&list->mutex);

> > I also suspected that something like that might happen, and therefore I
> > lock the elements one element ahead of the block I'm currently looking
> > at, so that the current block and the next are always locked.
> 
> Err, okay, I see that in the source code.  So in my scenario,
> T1 has a lock on B0, so that T2 cannot delete B0->next.
> 
> Foo.

Exactly. Sorry, again, that I didn't write that.

> 
> But I see so many lock's and unlock's in the code that I suspect it is
> a race condition in your code rather than a code generation bug or a
> pthread library bug.  It could still be a scenario where the list
> pointers are okay, but "next" has become a block which is deleted
> from the list somehow.
> 

I know, I didn't plan ahead good enough when I started writing it, and
now I'm stuck with either this, or a large rewrite.

> That still leaves the question of how to debug it.
> 
> I would actually start with a book on multi-threaded linked lists,
> and then find a library (or code a library) that implements them,
> and use that.  If you have a separate library then you can write some
> stress test code and provoke failures a lot faster.
> 

I would like to do that, and I have been thinking about it for a while,
but see above.

> > Therefore, when the program crashes, next and cur are equal,
> > and I cannot see what element it was at before.
> 
> Mmmm, throw in a "prev" variable, so that you say "prev = cur, cur = next"
> and then "prev" is available for debugging.

I've been thinking about that, too. Maybe I should just do that.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]