This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: Network code unstable (Solved for real this time).

From: Pieter Truter <ptruter at intrinsyc dot com>
To: 'eCos Disuss' <ecos-discuss at sources dot redhat dot com>
Date: Wed, 6 Mar 2002 09:32:34 -0800
Subject: RE: [ECOS] Network code unstable (Solved for real this time).

The ether chip stopped responding. I did not experience this problem in
RedBoot, mainly because of the lightweight stack used by RedBoot.

I will still do some tests to see how it affect the GDB stubs in RedBoot. I
know it is extremely slow to download an image with GDB/RedBoot.

-----Original Message-----
From: Douglas Bush [mailto:dbush@extremeeng.com]
Sent: Wednesday, March 06, 2002 9:23 AM
To: 'Pieter Truter'; 'Gary Thomas'; 'Andrew Lunn'
Cc: 'eCos Disuss'
Subject: RE: [ECOS] Network code unstable (Solved for real this time).

Would this driver problem affect redboot\GDB?  Were the crashes hard
resets, or just continuous loops?

-----Original Message-----
From: ecos-discuss-owner@sources.redhat.com
[mailto:ecos-discuss-owner@sources.redhat.com] On Behalf Of Pieter
Truter
Sent: Wednesday, March 06, 2002 10:12 AM
To: 'Gary Thomas'; Andrew Lunn
Cc: eCos Disuss
Subject: RE: [ECOS] Network code unstable (Solved for real this time).

After a lot of testing and debugging I found out that the CS8900 is
losing
interrupts under heavy network load. This is more prominent when running
from flash which is slower.

Looking at if_cs8900a.c I think I found the cause of my problem. The
time
between the interrupt and acknowledge() is too long. I then moved the
acknowledge() in cs8900a_deliver() to cs8900a_isr() just after the
mask()
and now everything works great.

I am still concerned about masking the interrupt for so long but I
understand that this is probably done to be able to use the BSD stack
with a
realtime OS.

The big problem with losing an interrupt from the CS8900a chip is that
you
have to cleanup all the info in the chip otherwise it would not generate
any
other interrupts. And if you do not know that you missed an interrupt
you
don't know when to cleanup. ;-(

This is what I have now:

cs8900a_isr(cyg_vector_t vector, cyg_addrword_t data, HAL_SavedRegisters
*regs)
{
    cs8900a_priv_data_t* cpd = (cs8900a_priv_data_t *)data;
    cyg_drv_interrupt_mask(cpd->interrupt);
    cyg_drv_interrupt_acknowledge(cpd->interrupt); // Moved here from
cs8900a_deliver()
    return (CYG_ISR_HANDLED|CYG_ISR_CALL_DSR);  // Run the DSR
}

cs8900a_deliver(struct eth_drv_sc *sc)
{
    cs8900a_poll(sc);
#ifdef CYGPKG_NET
    {
        cs8900a_priv_data_t *cpd = (cs8900a_priv_data_t
*)sc->driver_private;
        // Allow interrupts to happen again
        // cyg_drv_interrupt_acknowledge(cpd->interrupt); // Moved to
ISR
        cyg_drv_interrupt_unmask(cpd->interrupt);
    }
#endif
}

-----Original Message-----
From: Gary Thomas [mailto:gthomas@redhat.com]
Sent: Tuesday, February 19, 2002 5:47 AM
To: Andrew Lunn
Cc: Pieter Truter; eCos Disuss
Subject: Re: [ECOS] Network code unstable (Still Not Solved).

On Tue, 2002-02-19 at 01:41, Andrew Lunn wrote:
> On Mon, Feb 18, 2002 at 03:07:16PM -0800, Pieter Truter wrote:
> > Here is the promissed test app.
> > I am not sure what is working and what is not.
> > Sometimes it crash after 100 loops, sometimes after the first loop.
If I
put
> > it in ROM it always crash in the first loop.
> 
> I cannot see anything obvious wrong.
> 
> Do you have asserts enabled? If not, enable them. They are part of the
> INFRA package.
> 
> How does it crash? Lock solid? Drop in gdb? Throw an exception? Where
> in the cycle? Always at the beginning of the transfer? Always near the
> end? Somewhere in the middle? Does it vary?
> 
> Could it be something else on the network. Use tcpdump to grab all
> broadcast traffic and any unicast traffic to/from the node which is
> not tftp. See if anything happens as it crashes eg DHCP renewal, an
> NTP broadcast, a ping of death from some hacker in your network?

These are very good suggestions.  The problem is probably an elusive
one though.  I've run the test program here continuously for the last
12 hours, including from ROM, with no failures.

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss

Follow-Ups:
- Re: Network code unstable (Solved for real this time).
  - From: Roland Caßebohm

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]