This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: Re: random variate from power exponential distribution:continue]


On Tue, 19 Oct 2004, Linas Vepstas wrote:

> On Tue, Oct 19, 2004 at 01:42:27PM +0200, Olaf Lenz was heard to remark:
> > 
> > What strikes me as most remarkable is that the AMD CPU actually seems to
> > be ~30% faster than the Intel CPU, even though the Athlon has a lower
> > clock rate! 
> 
> This is not supposed to be surprising: this is, after all, 
> what cpu design is all about.   Different processors have a different

This is also >>common<<, the rule rather than the exception.  We are
buying Opterons for our compute clusters because they are the price
performance winners (at the moment, this changes with time) and really,
are pure performance winners.

Everything Linas points out below is true, and in particular is true of
Opterons.  64 bit cpus with a vastly improved memory architecture and
instruction pipelining make a real difference in many kinds of code.

I run a large Monte Carlo program on a mix of older P6 Intel CPUs, the
newer P4 (including Xeons), Athlons, and Opterons.  Opterons beat the
pants off of all of them.  An Opteron is the fastest way possible to run
i386 code -- Itaniums (which are one of the few processors out there
that can give it competition for best absolute performance) run i386
code in emulation mode, Opterons in native mode.

Admittedly, I haven't yet had the opportunity to try Nacoma -- Intel's
64 bit Xeon might or might not compete in performance or price
performance.  From what I've tried, dual Opterons are pretty
spectacular, given their relatively pedestrian clocks.  Of course a 64
bit CPU "should" be more than twice as fast as a 32 bit CPU on 64 bit
(double precision) code, and I see all of that and more.

   rgb

> number of fixed-point and floating point units, interconnected in
> various ways.  (This is called "super-scalar design") More exectution 
> units usually means more work can be done per clock cycle.  Also, 
> the goal of instruction dispatch is to keep each unit busy: better 
> dispatching designs mean processors that can do more even if the 
> clock speed is identical, and the number of execution units in
> the chip are identical.
> 
> Note also, there are usually trade-offs.  One design might run
> on type of algorithm very efficiently, and all other algos very 
> poorly.   Your code may have an algo that the AMD athalon is
> particularly good at;  other algos may run faster on intel,
> or on power.  Don't make buying decisions on clock speed alone.
> SPECmarks are much better indicator.
> 
> (Showing my age: In the very early days, circa power-1 (1988), 
> the power-1 ran 4-8x faster than a 486, even though the 486 clock 
> speed was 2 or 3x the power-1.  This is because the power-1 had
> lots of execution units, and a much much better dispatcher. 
> And more registers.  And more chip-memory bandwidth. And ...).
> 
> --linas
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]