This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: 0xDEADBEEF LOCK in spl_any() ;)



"Trenton D. Adams" <tadams@extremeeng.com> writes:
> FYI: Trying to debug WaveLAN pccard driver.
> 
> I'm getting a dead lock in spl_any () in
> "net/tcpip/current/src/ecos/synch.c"
 
> I don't what the "cyg_mutex_lock( &splx_mutex )" is locking splx_mutex
> for.  Anyone???

Splx() is the network stack mechanism for mutual exclusion and atomic
access to network drivers.

The implementation of splx() is locking a mutex in order to "do" mutual
exclusion; that's what it does.  Otherwise multiple threads might call into
your driver simultaneously.
 
>     if ( cyg_thread_self() != splx_thread ) {
>         cyg_mutex_lock( &splx_mutex );		// <<<  DEADLOCKS HERE

The thread you are looking at yields because it cannot get the mutex,
because some other thread owns the splx() lock.  The other thread seems not
to run to completion of the locked section of code.  Solve that, and you
have it!

>         old_spl = 0; // Free when we unlock this context
>         CYG_ASSERT( 0 == splx_thread, "Thread still owned" );
>         CYG_ASSERT( 0 == spl_state, "spl still set" );
>         splx_thread = cyg_thread_self();
>     }
> 
> FYI: spl_any () is called some time during the splsoftnet() call in
> "net/tcpip/current/src/sys/net/route.c:608"
> 
> On a last note, I would try and figure this out myself, but I found 2903
> occurrences of splx_mutex throughout the sources.  I imagine there's
> someone out there that understand the net stack better, and could give
> me a hint as to why this might be happening, and how to resolve it!? :)

Use GDB to look at the mutex when the system is "deadlocked"; it contains
an "owner" field.  That's a pointer to a thread, the owner.  See what
thread it is; see where it's executing.  There's your problem!

For example, if, somewhere in your driver code, you get stuck in a loop,
that would do it.  Because whatever thread enters the driver code, must own
the splx() lock and therefore owns that mutex.

Of course, it could be any or all of the usual causes of odd behaviour such
as unexpected deep recursion ie. your driver receives a packet, make the
call to give it to the stack and another call asking you to transmit comes
in because of that receive; you notice there is a packet ready, so you make
to receive a packet, make the call to give it to the stack and ... leading
to stack overflow.  Or just plain stack overflow anyway...

HTH,
	- Huge


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]