This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Thu, 2002-12-05 at 23:40, Michael Elizabeth Chastain wrote: > Hi Fredrik, > > > The only libraries are libc, libpthread, libdl and libpam. In the > > affected function, only libc and libpthread are used. > > What operating environment are you running on? If it is a Linux > platform, and gcc is the only compiler anywhere in sight, > then it's likely not an ABI clash. If it's non-Linux Unix, > this becomes slightly likely. If it is Cygwin/Windows then > it's a common gotcha. > It is a GNU/Linux platform, and, yes, I am using gcc. > > That would, of course, be a good thing. It's only that I'd have to do > > that after every single function call... That would take some time. > > Maybe I'll do it, anyway. > > Yes. I have added checks where I compare the current value of next to a saved buffer after every function call now. I am currently testing with it. > > mec> You could also try forcing your variable to be on the stack instead of a > mec> register. Remove the "register" attribute from the declaration of "next" > mec> if you have one. Then add a "do_nothing(&next)" call to your function, > mec> to force "next" to be on the stack instead of in a register. If the > mec> symptoms go away then it's more likely to really be a register clobber. > > > That just doesn't feel like a very elegant solution, though. > > Oh, it's not meant to be a solution, it's meant to be a diagnostic tool > to help figure out the problem. > True, of course. I just don't really understand where it would lead. > > But next isn't stored in memory at any place, so it cannot be that. > > 'next' is initialized from a memory location though, and you have no > check that it is valid when it is first initialized. Actually that > would be a good check to add. > That's true of course. I have, however, already added such checks recently > > If the list was to be made unstable by a buggy function somewhere, it > > would have to restored again by the same function (since it's always > > consistent when I look at it), and I just don't see that happening. > > Mmmm, that is not true! > > Let us stare at your source code a bit: > > /* 1 */ for(cur = list; cur != NULL; cur = next) > /* 2 */ { > /* 3 */ if((next = cur->next) != NULL) > /* 4 */ pthread_mutex_lock(&next->mutex); > /* 5 */ ... /* next is not mentioned anymore */ > /* 6 */ } > > Suppose that you have two threads, T1 and T2, and three blocks > on the list, B0, B1, B2. > > T1 executes [1], "cur = list", so "cur" holds the address of B0. > T1 executes [3], "next = cur->next", so "next" holds the address of B1. > T2 is scheduled -- and T1 is holding no mutexes! Sorry that I didn't mention it, but just above the loop, I actually do have if(list != NULL) pthread_mutex_lock(&list->mutex); > > I also suspected that something like that might happen, and therefore I > > lock the elements one element ahead of the block I'm currently looking > > at, so that the current block and the next are always locked. > > Err, okay, I see that in the source code. So in my scenario, > T1 has a lock on B0, so that T2 cannot delete B0->next. > > Foo. Exactly. Sorry, again, that I didn't write that. > > But I see so many lock's and unlock's in the code that I suspect it is > a race condition in your code rather than a code generation bug or a > pthread library bug. It could still be a scenario where the list > pointers are okay, but "next" has become a block which is deleted > from the list somehow. > I know, I didn't plan ahead good enough when I started writing it, and now I'm stuck with either this, or a large rewrite. > That still leaves the question of how to debug it. > > I would actually start with a book on multi-threaded linked lists, > and then find a library (or code a library) that implements them, > and use that. If you have a separate library then you can write some > stress test code and provoke failures a lot faster. > I would like to do that, and I have been thinking about it for a while, but see above. > > Therefore, when the program crashes, next and cur are equal, > > and I cannot see what element it was at before. > > Mmmm, throw in a "prev" variable, so that you say "prev = cur, cur = next" > and then "prev" is available for debugging. I've been thinking about that, too. Maybe I should just do that.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |