This is the mail archive of the
newlib@sourceware.org
mailing list for the newlib project.
Re: [PATCH] more problems with newlib/libc/machine/m68k/memcpy.S
- From: Maxim Kuvyrkov <maxim at codesourcery dot com>
- To: Josef Wolf <jw at raven dot inka dot de>, newlib at sourceware dot org
- Date: Wed, 10 Feb 2010 17:47:04 +0300
- Subject: Re: [PATCH] more problems with newlib/libc/machine/m68k/memcpy.S
- References: <20100209141511.GL12505@raven.wolf.lan>
On 2/9/10 5:15 PM, Josef Wolf wrote:
Hello folks,
I have done a closer look at newlib/libc/machine/m68k/memcpy.S and I think
there are a couple of problems with this code:
1. When the destination pointer is mis-aligned, up to three bytes are copied
before getting into the optimized main loop. But d1, which contains the
number of bytes to copy, is never adjusted to account for this fact. As a
consequence, up to 3 byte too much can be copied by memcpy().
When destination is mis-aligned, D1 is properly adjusted here:
and.l #3,d0 | look for the lower two only
beq 2f | is aligned?
sub.l d0,d1
Or am I missing something?
2. When MISALIGNED_OK==0, no attempt is made to get the pointers to aligned
positions. This results in much slower code than the old code for such
CPUs. IMHO, the cause of the problem reported in the thread starting with
http://sourceware.org/ml/newlib/2009/msg01079.html was mis-interpreted.
The problem was _not_ the attempt to align dest. The problem was that
even after dest was aligned, src was still not aligned. Thus, disabling
alignment completely is the wrong response to that problem.
The problem was not misinterpreted; and it is true that memcpy can be
optimized for CPUs without alignment module by using 2-byte copies. It
is not straightforward to implement a single memcpy routine that will be
optimal for both CPUs that support unaligned accesses and those which
don't -- that is why I posted a simple byte-copy fix.
It seems that your patch lengthens the critical path for CPUs which
support unaligned accesses.
3. -mcpu32 seems to imply -mc68020. So the check for alignment capabilities
gives a wrong result for cpu32. BTW: alignment capabilities depend not
only on the CPU. It is also dependant on bus width and how the memory is
connected.
Thank you for pointing this out.
Regards,
--
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724