This is not the right fix, and the problem is nothing to do with the
toolchain.
FreeScale have redefined the nop instruction for ColdFires to act as a
barrier instruction. When the cpu executes nop it stalls the
instruction pipeline until all previous instructions have completed.
Also, if the processor has some kind of write buffer between the cache
and external memory then nop will stall until the write buffer has
been emptied. Most of the time you do not need to worry about pipeline
effects like this, but DMA engines and cache interactions nearly
always need very special care.
The operation immediately before starting the transmit is a
HAL_DCACHE_STORE(). Without the nop that macro may not have finished
executing, so the cache store is still happening at the point that the
ethernet engine fetches the buffer descriptors. Hence the ethernet
engine may see bogus buffer descriptors, and confusion results.
The correct solution is to incorporate the nop into your processor's
HAL_DCACHE_STORE() macro, thus making the macro do precisely what it
is supposed to do. That will fix any other uses of HAL_DCACHE_STORE(),
not just the ethernet driver. It also avoids adding an unnecessary nop
instruction on processors which do not need it, e.g. the mcf5272 which
does not have a data cache at all.