This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: Will Newton <will dot newton at linaro dot org>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>, Ondřej Bílka <neleai at seznam dot cz>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Mon, 2 Sep 2013 15:18:28 +0100
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <520894D5 dot 7060207 at linaro dot org> <CANu=DmiBHoymFKTvaW_VsdhWZEYwkfViz1tTeRgj7H80f0FntA at mail dot gmail dot com> <5220D30B dot 9080306 at redhat dot com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com> <5220F1F0 dot 80501 at redhat dot com>
On 30 August 2013 20:26, Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/30/2013 02:48 PM, Will Newton wrote:
>> On 30 August 2013 18:14, Carlos O'Donell <carlos@redhat.com> wrote:
>>
>> Hi Carlos,
>>
>>>>> A small change to the entry to the aligned copy loop improves
>>>>> performance slightly on A9 and A15 cores for certain copies.
>>>>>
>>>>> ports/ChangeLog.arm:
>>>>>
>>>>> 2013-08-07 Will Newton <will.newton@linaro.org>
>>>>>
>>>>> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
>>>>> on entry to aligned copy loop for improved performance.
>>>>> ---
>>>>> ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> Ping?
>>>
>>> How did you test the performance?
>>>
>>> glibc has a performance microbenchmark, did you use that?
>>
>> No, I used the cortex-strings package developed by Linaro for
>> benchmarking various string functions against one another[1].
>>
>> I haven't checked the glibc benchmarks but I'll look into that. It's
>> quite a specific case that shows the problem so it may not be obvious
>> which one is better however.
>
> If it's not obvious how is someone supposed to review this patch? :-)
>
>> [1] https://launchpad.net/cortex-strings
>
> There are 2 benchmarks. One appears to be dhrystone 2.1, which isn't a string
> test in and of itself which should not be used for benchmarking or changing
> string functions. The other is called "multi" and appears to run some functions
> in a loop and take the time.
>
> e.g.
> http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/benchmarks/multi/harness.c
>
> I would not call `multi' exhaustive, and while neither is the glibc performance
> benchmark tests the glibc tests have received review from the glibc community
> and are our preferred way of demonstrating performance gains when posting
> performance patches.
>
> I would really really like to see you post the results of running your new
> implementation with this benchmark and show the numbers that claim this is
> faster. Is that possible?
The mailing list server does not seem to accept image attachments so I
have uploaded the performance graph here:
http://people.linaro.org/~will.newton/glibc_memcpy/sizes-memcpy-08-04-2.5.png
--
Will Newton
Toolchain Working Group, Linaro