This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Fwd: Re: random variate from power exponential distribution:continue]

From: "Robert G. Brown" <rgb at phy dot duke dot edu>
To: Olaf Lenz <olenz at Physik dot Uni-Bielefeld dot DE>
Cc: Linas Vepstas <linas at austin dot ibm dot com>,GSL Discussion List <gsl-discuss at sources dot redhat dot com>
Date: Wed, 20 Oct 2004 07:13:12 -0400 (EDT)
Subject: Re: [Fwd: Re: random variate from power exponential distribution:continue]
References: <4174FDA3.8020801@Physik.Uni-Bielefeld.DE> <20041019221430.GB10026@austin.ibm.com><Pine.LNX.4.58.0410191859460.3567@ganesh.phy.duke.edu><417604DA.40409@Physik.Uni-Bielefeld.DE>

On Wed, 20 Oct 2004, Olaf Lenz wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello!
> 
> Thanks for the good answers! I already knew that CPU architecture plays
> a major role in the execution speed, but I did not expect them to be as
> dramatic as in the case of the P4 and Athlon XP that I expected to be
> quite similar.
> 
> Does anyone of you know any source where one can find a summary of the
> different CPU types and the characteristics they havbe, especially from
> a programmers' point of view? If I have to buy new computers for my
> working group, how could I decide which bring the best performance for
> our typical problems?

This >>is<< the million dollar question, isn't it?

I've written quite a bit on it on the beowulf list; there are numerous
discussions in the list archives.  Let's see if I can give you a brief
summary here.

First, you can usually find white papers on CPU organization and design
on mfrs websites.  They tend to be pretty technical and most won't be of
much use to you unless you are an ubergeek, but there is a lot of
information there and sometimes there are technical advertising docs
that are more readable.

Second, the better computer mags (and websites, e.g. Toms hardware)
often have articles on CPU architecture, especially when an architecture
is new.  Byte used to be really good about this but alas then it went
away.

More practically (and third) you can check out the major benchmark
sites, which often have summary results.  Perhaps the most useful for
"typical problems" is the SPEC (Standard Performance Evaluation
Corporation) site (www.spec.org), as the SPEC suite of benchmarks is in
wide use and has been applied to nearly every architecture, often with
several compilers.  It is worth noting that different compilers alone
can sometimes produce that 30% variation that you note for certain
problems -- for example when a CPU has some special pipelining feature
that CAN be used but ISN'T used by standard x86 code as it requires
special instructions and/or structuring of the code.  The SPEC suite IS
a suite -- each kind of "benchmark" consists of a constellation of
different applications.  If you look at the results broken down by the
components of this constellation, you can often find a floating point or
integer application "like" your own and get an entirely apropos
comparative result.

lmbench (www.bitmover.com) is another benchmark (this time a
microbenchmark suite) with lots of comparative results accumulated over
years.  You need to be a bit ubergeekish to use it and understand how to
use the results -- its mailing list tends to be frequented with the
likes of Linus Torvalds, other Linux and BSD kernelvolken, compiler
developers -- but it is the actual tool used by many kernel and compiler
writers to tune performance and understand and measure specific code or
operational bottlenecks (such as memory|disk|network latency and
bandwidth, context switch overhead, etc.).  To use it effectively, it
really helps if you know your own code really well and know if it tends
to be memory bus bound, CPU clock bound, network bound, integer
instruction intensive vs floating point instruction intensive, filled
with trancendental function calls or not (these can be implemented in
hardware or software with significantly varying speeds).

If your code is lots of linear algebra, the venerable linpack benchmark
or stream benchmark might be useful.  Google for sites.  To wrap up
benchmarks, let me point out that I've written a couple of benchmarking
tools myself that use the CPU clock cycle counter as a timer (written
using assembler fragments before the pretty "timer()" interface just
described, which is accurate to order 10 nanoseconds, was available -- I
actually parse /proc/cpuinfo to get the cpu clock, for example:-).  One
of these (rand_rate) times all the random number generators in GSL (and
applies various tests from diehard and NIST to them, although I got
bored before rewriting and implementing the entire suite(s)) because
timing and testing RNG's seems to be quite the thing to do if you plan
to use a lot of them, and I do Monte Carlo across large clusters and
actually use a rather lot of uniform deviates...;-)

The other is a "wrapper" (cpu_rate) into which one can insert code
fragments you might want to test/time.  It isn't quite finished (sigh --
too busy, too busy) but currently wraps up a definition of "bogoflops"
that involves the mean */+- time in vector context -- note the presence
of division, which is typically much slower than the *.  It also
includes stream, savage (an old trancendental benchmark) and a few
others that are "experimental" as I moved from wrapping a few specific
tests to being able to wrap an arbitrary fragment of code (which
required a reorganization of code, as one might expect:-).  Its one
"clever" feature relative to stream and lmbench etc. is that it
implements a shuffle algorithm to be able to >>defeat<< pipelines by
accessing vector memory elements in a long vector in random order.
Comparing rates to the straight vector implementation (as vector sizes
sweep across cache boundaries) one can learn quite a lot about the
efficiency of the latter and code organization for optimum performance
per processor.

FINALLY, remember the oldest of computer performance adages -- the best,
nay, the ONLY relevant benchmark is your own code.  Screw benchmarks,
really (including my own).  To compare performance of systems on your
code, run you code on many systems.  Most good systems vendors will
either lend you systems or provide account access on systems in order to
test your application on their hardware.  Use this, or borrow systems
from the guy down the hall, or whatever, but by hook or by crook try to
run your stuff on lots of hardware, ideally with several compilers.
That way you can REALLY optimize your purchase next time around.

   rgb

(I hope this isn't too much of a digression for a scientific library
list, given that performance IS important...;-)

> 
> Olaf
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
> iD8DBQFBdgTatQ3riQ3oo/oRAt75AJ478MEh+ndi2tpbJfvchD1ZaDMS3QCeNg99
> BR25GgnjReJHhm8HDAUovu0=
> =k2BZ
> -----END PGP SIGNATURE-----
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

Follow-Ups:
- Re: [Fwd: Re: random variate from power exponential distribution: continue]
  - From: Linas Vepstas

References:
- [Fwd: Re: random variate from power exponential distribution: continue]
  - From: Olaf Lenz
- Re: [Fwd: Re: random variate from power exponential distribution: continue]
  - From: Linas Vepstas
- Re: [Fwd: Re: random variate from power exponential distribution:continue]
  - From: Robert G. Brown
- Re: [Fwd: Re: random variate from power exponential distribution:continue]
  - From: Olaf Lenz

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]