This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: pbuf_alloc failures with LwIP


Hi,
I'm using multiple threads in my application.
The SYS_ARCH_PROTECT is not enabled at all.
How do I  enable it?
Is it enabled in the current CVS version?
Does the current LwIP in CVS is compatible with ecos-3.0? In yes I
would consider upgrading LwIP package only.

Thanks
Elad


On Thu, Jun 14, 2012 at 5:38 PM, Michael O'Dowd
<michael.odowd@kuantic.com> wrote:
> Hi Elad,
>
> Hmm, I've had a quick look at the pbuf management in eCos 3.0. It's quite
> different from the CVS version, so I'm not that familiar with it.
>
> Nonetheless, I'm surprised by the PBUF statistics:
>
> ?PBUF - "each pbuf is 1024 bytes"
> ? ? ? ? ?avail: 30
> ? ? ? ? ?used: 1
> ? ? ? ? ?max: 11
> ? ? ? ? ?err: 2
> ? ? ? ? ?alloc_locked: 0
> ? ? ? ? ?refresh_locked: 0
>
> There's something wrong here. Considering that "alloc_locked = 0", the only
> way for "err" to be incremented is if you run out of pbufs. However, the
> sign that you have run out of pbufs is that "max" equals "avail". Yet, in
> your case, max = 11, while avail = 30. So you didn't run out of pbufs, you
> only used 11 out of 30.
>
> Digging a bit more, it appears that "err" in increased when
> pbuf_pool_alloc() returns NULL. This happens when the linked-list of
> available pbufs is empty.
>
> So, how come the linked-list of available pbufs is empty when max = 11? In
> my opinion, the linked-list of available pbufs is corrupt or truncated.
>
> Are you sure that you're respecting the thread-safe requirements of lwIP?
> Are you using multiple threads? If so, make sure that the SYS_ARCH_PROTECT
> macro (in lwip/sys.h) is defined to do something useful, rather than being
> an empty definition.
>
> Regards,
>
> Michael.
>
> On 14/06/2012 06:43, Elad Yosef wrote:
>>
>> Hi Michael,
>> Thanks for the detailed reply.
>>
>> I think I have exactly the same problem that you have - the networking
>> stops working.
>>
>> I got the LwIP stats after the networking stopped working, see
>>
>>
>>
>> LINK
>> ? ? ? ? xmit: 0
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 0
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 0
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 0
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 0
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> IP_FRAG
>> ? ? ? ? xmit: 0
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 0
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 0
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 0
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 0
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> IP
>> ? ? ? ? xmit: 17643
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 63100
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 0
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 0
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 0
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> ICMP
>> ? ? ? ? xmit: 2775
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 2950
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 175
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 0
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 175
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> UDP
>> ? ? ? ? xmit: 4714
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 53209
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 0
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 0
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 0
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> TCP
>> ? ? ? ? xmit: 6715
>> ? ? ? ? rexmit: 0
>> ? ? ? ? recv: 6941
>> ? ? ? ? fw: 0
>> ? ? ? ? drop: 0
>> ? ? ? ? chkerr: 0
>> ? ? ? ? lenerr: 0
>> ? ? ? ? memerr: 2705
>> ? ? ? ? rterr: 0
>> ? ? ? ? proterr: 0
>> ? ? ? ? opterr: 0
>> ? ? ? ? err: 0
>> ? ? ? ? cachehit: 0
>>
>> PBUF - "each pbuf is 1024 bytes"
>> ? ? ? ? avail: 30
>> ? ? ? ? used: 1
>> ? ? ? ? max: 11
>> ? ? ? ? err: 2
>> ? ? ? ? alloc_locked: 0
>> ? ? ? ? refresh_locked: 0
>>
>> ?MEM HEAP
>> ? ? ? ? avail: 1024
>> ? ? ? ? used: 0
>> ? ? ? ? max: 720
>> ? ? ? ? err: 0
>>
>> ?MEM PBUF
>> ? ? ? ? avail: 8
>> ? ? ? ? used: 0
>> ? ? ? ? max: 2
>> ? ? ? ? err: 0
>>
>> ?MEM RAW_PCB
>> ? ? ? ? avail: 4
>> ? ? ? ? used: 0
>> ? ? ? ? max: 0
>> ? ? ? ? err: 0
>>
>> ?MEM UDP_PCB
>> ? ? ? ? avail: 3
>> ? ? ? ? used: 3
>> ? ? ? ? max: 3
>> ? ? ? ? err: 0
>>
>> ?MEM TCP_PCB
>> ? ? ? ? avail: 16
>> ? ? ? ? used: 0
>> ? ? ? ? max: 8
>> ? ? ? ? err: 0
>>
>> ?MEM TCP_PCB_LISTEN
>> ? ? ? ? avail: 1
>> ? ? ? ? used: 1
>> ? ? ? ? max: 1
>> ? ? ? ? err: 0
>>
>> ?MEM TCP_SEG
>> ? ? ? ? avail: 6
>> ? ? ? ? used: 0
>> ? ? ? ? max: 4
>> ? ? ? ? err: 0
>>
>> ?MEM NETBUF
>> ? ? ? ? avail: 10
>> ? ? ? ? used: 0
>> ? ? ? ? max: 6
>> ? ? ? ? err: 0
>>
>> ?MEM NETCONN
>> ? ? ? ? avail: 12
>> ? ? ? ? used: 4
>> ? ? ? ? max: 7
>> ? ? ? ? err: 0
>>
>> ?MEM API_MSG
>> ? ? ? ? avail: 6
>> ? ? ? ? used: 0
>> ? ? ? ? max: 2
>> ? ? ? ? err: 0
>>
>> ?MEM TCP_MSG
>> ? ? ? ? avail: 12
>> ? ? ? ? used: 0
>> ? ? ? ? max: 7
>> ? ? ? ? err: 0
>>
>> ?MEM TIMEOUT
>> ? ? ? ? avail: 4
>> ? ? ? ? used: 2
>> ? ? ? ? max: 3
>> ? ? ? ? err: 0
>>
>>
>> I would appreciate if can take a look
>>
>> Elad
>>
>>
>> On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd
>> <michael.odowd@kuantic.com> wrote:
>>>
>>> Hi Elad,
>>>
>>> I ran into a similar problem recently. I'm using a recent CVS checkout
>>> rather than 3.0. Also, I'm probably not using the same ethernet HW, so I
>>> don't know how well my reply corresponds to your case.
>>>
>>> The eth_drv.c file is the glue between lwIP and the underlying ethernet
>>> driver, so the issue that you are encountering may be specific to the
>>> driver. In my case, when under stress, eth_drv.c generates the error
>>> message: "cannot allocate pbuf to receive packet". Soon after that, the
>>> ethernet driver stops receiving traffic permanently, but does not crash.
>>> In
>>> your case, if I understand correctly, your system crashes.
>>>
>>> The issue is that when eth_drv_recv() fails to allocate a pbuf, it
>>> returns
>>> without calling the ethernet driver recv() function: (sc->funs->recv)().
>>> In
>>> my case, the driver requires that it's recv() function be called, in
>>> order
>>> to complete the processing of the packet reception and to free up the
>>> receive buffer(s). Failing to call it, apparently causes the receive path
>>> to
>>> cease functioning (I'm still investigating the details). In your case, I
>>> gather that it crashes the system.
>>>
>>> Note: I'm running on an NXP 1788 (Cortex-M3), using the
>>> "devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver.
>>>
>>> There are two aspects to this problem:
>>>
>>> 1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbufs
>>> available, then it should at least cause the received packet to be
>>> discarded. Otherwise, the system may fail whenever there is a minor burst
>>> of
>>> traffic on the network. It doesn't take much: there are only 16 pbufs
>>> available by default. Whether or not the system fails, depends on how the
>>> ethernet driver reacts to the failure to call it's recv() function. I
>>> hope
>>> to fix this on my platform in the near future.
>>>
>>> 2) You should also keep an eye on your pbuf usage, just to make sure that
>>> you don't have a pbuf memroy leak. You could also try to allocate more
>>> pbufs, if you have the available memory.
>>>
>>> If you are using the default lwip configuration, the pbuf memory
>>> allocation
>>> is handled by memp.[hc]. It has a fixed number of pbufs available. The
>>> default is 16 pbufs, and can be changed in the configtool under: [lwIP
>>> networking stack/Memory options/Number of memp struct pbufs].
>>>
>>> Alternatively, if you have lots of memory, you could enable the checkbox:
>>> [lwIP networking stack/Memory options/Use malloc for pool allocations].
>>> This
>>> bypasses the memp pools and their static limitations. Though this will
>>> make
>>> it harder to spot a pbuf memory leak. I haven't tried this personally.
>>>
>>> Finally, (when using memp) the pbuf usage can be monitored with
>>> lwip/stats.h. If you have access to a serial port, try calling
>>> stats_display(). Here is a snippet of the pbuf related output:
>>>
>>>> ?MEM PBUF_POOL
>>>> ? ? ? ? ?avail: 16
>>>> ? ? ? ? ?used: 0
>>>> ? ? ? ? ?max: 3
>>>> ? ? ? ? ?err: 0
>>>
>>> The "err" counter increases when pbuf_alloc() fails.
>>>
>>> Hope that helps,
>>>
>>> Regards,
>>>
>>> Michael O'Dowd
>>> Kuantic SAS
>>>
>>>
>>> On 12/06/2012 22:40, Elad Yosef wrote:
>>>>
>>>> Hi all,
>>>> I'm using LwIP stack on my target and experiencing crashes under stress.
>>>>
>>>> function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c
>>>> calls pbuf_alloc() and this allocation fails.
>>>>
>>>> Is this result of some bad configuration?
>>>>
>>>> Thanks
>>>> Elad
>>>>
>

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]