This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gsl container designs

From: "Robert G. Brown" <rgb at phy dot duke dot edu>
To: Rhys Ulerich <rhys dot ulerich at gmail dot com>
Cc: gsl-discuss at sourceware dot org
Date: Thu, 7 Jan 2010 08:22:03 -0500 (EST)
Subject: Re: gsl container designs
References: <1259110486.3028.69.camel@manticore.lanl.gov> <m3fx7q4mdr.wl%bjg@network-theory.co.uk> <4B4477B8.50305@iki.fi> <alpine.LFD.2.00.1001061011530.3462@localhost> <1262829020.27244.361.camel@manticore.lanl.gov> <alpine.LFD.2.00.1001062138040.3462@localhost> <4a00655d1001062110m139c0a8tf2eae7de67da8f6f@mail.gmail.com> <4a00655d1001062146g555fd9dfh1e333613d8e3b463@mail.gmail.com>

On Wed, 6 Jan 2010, Rhys Ulerich wrote:

I recall from my benchmarking days that -- depending on compiler --
there is a small dereferencing penalty for packed matrices (vectors
packed into dereferencing **..* pointers) compared to doing the offset
arithmetic via brute force inline or via a macro.
......
I haven't
run the benchmark recently and don't know how large it currently is. ?It
was never so large that it stopped me from using repacked pointers for
code clarity..

Mostly unscientific, but worth tossing into the mix:

Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations
of doing the nested pointers approach is neck-and-neck with index arithmetic
on a 10x10 double matrix. ?For the 100x100 case it takes 1.3 times longer
to iterate using the nested pointers. ?Work in the inner loop "compute
kernel" is
*= against a constant scalar. ?Optimization flags on -O3. ?I've seen similar
behavior on recent GNU compilers.


That sounds partly like a cache effect -- 10x10 almost certainly stays
in L1, 100x100 won't fit.  My own experience is similar, although I
don't recall the multiplier being as large as 1.3 (but then, I was doing
stream and stream-like tests with very large vectors, which means that
one spends more time in a vector streaming mode and minimizes
cache-thrashing when turning corners).  And my memory could be faulty --
I'm an old guy, after all, early Alzheimers...;-)

rgb

I'm happy to provide the test code if anyone's interested.

- Rhys


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

References:
- Re: gsl container designs
  - From: Tuomo Keskitalo
- Re: gsl container designs
  - From: Robert G. Brown
- Re: gsl container designs
  - From: Gerard Jungman
- Re: gsl container designs
  - From: Robert G. Brown
- Re: gsl container designs
  - From: Rhys Ulerich

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]