This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Will Newton <will dot newton at linaro dot org>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>
- Date: Wed, 4 Sep 2013 12:37:33 -0500
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <5220D30B dot 9080306 at redhat dot com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com> <5220F1F0 dot 80501 at redhat dot com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw at mail dot gmail dot com> <52260BD0 dot 6090805 at redhat dot com> <20130903173710 dot GA2028 at domone dot kolej dot mff dot cuni dot cz> <522621E2 dot 6020903 at redhat dot com> <20130903185721 dot GA3876 at domone dot kolej dot mff dot cuni dot cz> <5226354D dot 8000006 at redhat dot com> <20130904073008 dot GA4306 at spoyarek dot pnq dot redhat dot com> <20130904110333 dot GA6216 at domone dot kolej dot mff dot cuni dot cz>
On Wed, Sep 4, 2013 at 6:03 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote:
>> 2. Scale with size
> Not very important for several reasons. One is that big sizes are cold
> (just look in oprofile output that loops are less frequent than header.)
>
> Second reason is that if we look at caller large sizes are unlikely
> bottleneck.
From my experience, extremely large data sizes are not very common.
Optimizing for those gets diminishing returns. I believe that at very
large sizes the pressure is all on the hardware anyway. Prefetching
large amounts of data in a loop takes a fixed amount of time and given
a large enough amount of data, the overhead introduced by most other
factors is negligible.
>> 4. Measure the effect of dcache pressure on function performance
>> 5. Measure effect of icache pressure on function performance.
>>
> Here you really need to base weigths on function usage patterns.
> A bigger code size is acceptable for functions that are called more
> often. You need to see distribution of how are calls clustered to get
> full picture. A strcmp is least sensitive to icache concerns, as when it
> is called its mostly 100 times over in tight loop so size is not big issue.
> If same number of call is uniformnly spread through program we need
> stricter criteria.
Icache pressure is probably one of the more difficult things to
measure with a benchmark. I suppose it'd be easier with a pipeline
analyzer.
Can you explain how usage pattern analysis might reveal icache pressure?
I'm not sure how useful 'usage pattern' are when considering dcache
pressure. On Power we have data-cache prefetch instructions and since
we know that dcache pressure is a reality, we will prefetch if our
data sizes are large enough to out-weigh the overhead of prefetching,
e.g., when the data size exceeds the cacheline size.
Ryan