This is the mail archive of the
gsl-discuss@sourceware.org
mailing list for the GSL project.
Re: gsl container designs
On Wed, 6 Jan 2010, Rhys Ulerich wrote:
I recall from my benchmarking days that -- depending on compiler --
there is a small dereferencing penalty for packed matrices (vectors
packed into dereferencing **..* pointers) compared to doing the offset
arithmetic via brute force inline or via a macro.
......
I haven't
run the benchmark recently and don't know how large it currently is. ?It
was never so large that it stopped me from using repacked pointers for
code clarity..
Mostly unscientific, but worth tossing into the mix:
Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations
of doing the nested pointers approach is neck-and-neck with index arithmetic
on a 10x10 double matrix. ?For the 100x100 case it takes 1.3 times longer
to iterate using the nested pointers. ?Work in the inner loop "compute
kernel" is
*= against a constant scalar. ?Optimization flags on -O3. ?I've seen similar
behavior on recent GNU compilers.
That sounds partly like a cache effect -- 10x10 almost certainly stays
in L1, 100x100 won't fit. My own experience is similar, although I
don't recall the multiplier being as large as 1.3 (but then, I was doing
stream and stream-like tests with very large vectors, which means that
one spends more time in a vector streaming mode and minimizes
cache-thrashing when turning corners). And my memory could be faulty --
I'm an old guy, after all, early Alzheimers...;-)
rgb
I'm happy to provide the test code if anyone's interested.
- Rhys
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu