This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Two suggestions...


A couple of consistency-check questions or suggestions as I'm writing a
pretty complicated GSL-based numerical program.

  a) In relation to the recent discussion on multidimensional arrays,
what about adding gsl_tensor?  Something like:

typedef struct
{
  size_t dim;
  size_t stride;
  size_t *size;
  size_t *lind;
  void * data;
  gsl_block * block;
  int owner;
} gsl_tensor;

where dim is the dimension of the tensor, stride is the length of the
data elements in the tensor (e.g. sizeof(whatever)), size[] is the size
(length) of each tensor component, lind[] is the lower (initial) index
of each component (so tensor components don't have to be indexed
[0,(size[]-1)]), *data still points to the actual data address (which
might be better declared as a void * than double *), and the other two
are pretty much the same as gsl_matrix.

Both gsl_vector and gsl_matrix would then be special cases of
gsl_tensor, and one could call the ODE solver with a double cast of the
data block, while still using the same call set to generate e.g. int,
char, or even string/struct tensors.  I'm tentatively trying a type by
this name/struct in my own code, as I really want to be able to make a
lattice of structs in arbitrary dimension: (lattice coordinates, spin
degrees of freedom attached to each coordinate).

  b) It is numerically much more efficient to evaluate random numbers a
vector at a time.  This saves two ways.  By creating an "empty" random
number generator in the gsl format and adding it to my test/benchmark
program I measure overhead on the function call alone to be between 10
and 30 nanoseconds on CPUs that clock from 800 MHz to >2 GHz, and this
can is anywhere from 1/3 to 2/3 of the total time required to evaluate
e.g. mt19937.  Simply using one function call to generate a full page of
rands at a time would thus result in a speedup by as much as 3.  Next,
many rng's can use pipelines and cache much better in a vector loop
where the CPU, memory type, and compiler permit, especially if they
don't have to give up CPU registers in between.  This could speed up
rng's by another factor of 2, for as much as 6 overall -- from perhaps
30 nanoseconds (or more) to as little as 5 or 10 nanoseconds.  This kind
of speedup matters in lots of Monte Carlo applications.

Is there any thought about creating a vector-rng interface with perhaps
an inline conditional macro instead of a subroutine call?

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]