HPCC Results

- 32 Processor Runs - Kiviat Diagrams
- 64 Processor Runs - Kiviat Diagrams
 
The HPC Challenge benchmark suite is made up of the following components:

 

G-HPL ( system performance )

 

HPL, Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second

 

G-PTRANS (A=A+B^T, MPI) ( system performance )

 

PTRANS (A=A+B^T, MPI), Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second

 

G-RandomAccess ( system performance )

 

Global RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second

 

EP-STREAM Triad ( per process )

 

The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Bytes per Second

 

G-STREAM Triad ( system performance - derived )

 

The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is multiplied by the number of processes to attain this derived value. Unit: Giga Bytes per Second

 

EP-DGEMM ( per process )

 

Embarrassingly Parallel DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second

 

G-FFTE ( system performance )

 

Global FFTE, performs the same test as FFTE but across the entire system by distributing the input vector in block fashion across all the nodes. Unit: Giga Flops per Second

 

Randomly Ordered Ring Bandwidth ( per process )

 

Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating nodes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second

 

Randomly-Ordered Ring Latency ( per process )

 

Randomly-Ordered Ring Latency, reports latency in the ring communication pattern. The communicating nodes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds



HPCC 64 Processor Runs:

Machine Information:

Machine ID Processor Type Proc Speed Proc Count Threads Processes System Name Interconnect MPI
Cray X1 Cray X1 MSP 0.8GHz 64 1 64 X1 Cray modified 2D Torus Cray MPT 2.2
Cray XD1 AMD Opteron 2.2GHz 64 1 64 XD1 Rapid Array Interconnect System MPI over rapid array
Dalco QsNetII Cluster AMD Opteron 2.2GHz 64 1 64 Opteron / QsNet Linux Cluster QsNet II Quadrics qsnetmpi 1.24-39
IBM Power 4 p690 IBM Power4 1.3GHz 64 1 64 p690 Colony POE 3.2
IBM Power 4+ IBM Power 4+ 1.7GHz 64 1 64 e-server pSeries 655 HPS (IBM High Performance Switch) PE 4.1
NEC SX-6 NEC SX6 0.5GHz 64 1 64 SX-6 Internode Crossbar Switch MPI/SX 6.7.5
SGI Altix Bx2 Intel Itanium 2 1.6GHz 64 1 64 Altix 3700 Bx2 N/A SGI MPT 1.12
Sun V20z Opteron AMD Opteron 2.2GHz 64 1 64 Sun Fire V20z Cluster Gigabit Ethernet Cisco 6509 switch LAM/MPI 7.1.1
Dell EM64T Cluster Intel Xeon EM64T 3.4GHz 64 1 64 Power Edge 1850 Cluster Infiniband scali MPI connect, scampi- 3.3.4-8.rhel3
Scaliwag GigE AMD Opteron 2.0GHz 64 1 64 scaliwag Gigabit Ethernet / netgear GS748T scali MPI  4.3-7
Scaliwag SCI AMD Opteron 2.0GHz 64 1 64 scaliwag Wulfkit's Scalable Coherent Interface (SCI) scali MPI 4.3-7
Scaliwag IB AMD Opteron 2.0GHz 64 1 64 scaliwag Infiniband (mellanox) scali MPI 4.3-7
Cray XD1 AMD Opteron 2.2GHz 64 1 64 csexd1 Rapid Array Interconnect System MPI over rapid array

Click on graph for data

Click on graph for data

Click on graph for data

Click on graph for data

  Click on graph for data

Click on graph for data

Click on graph for data

  Click on graph for data

 

HPCC 32 Processor Runs:

Machine Information:

Machine ID Manufacturer Processor Type Proc Speed Proc Count Threads Processes System Name Interconnect MPI
Cray X1 Cray Cray X1 MSP 0.8GHz 32 1 32 X1 Cray modified 2D Torus Cray MPT 2.4
SGI Altix Bx2 SGI  Intel Itanium 2 1.6GHz 32 1 32 Altix 3700 Bx2 N/A SGI MPT 1.12
NEC SX-6 NEC NEC SX6 0.5GHz 32 1 32 SX-6 Internode Crossbar Switch MPI/SX 6.7.5
SGI Altix SGI Intel Itanium 2 1.3GHz 32 1 32 Altix 3700 Bx2 NUMAlink SGI MPT 1.8-1
Dell Xeon Dell Intel Xeon 2.4GHz 32 1 32 Power Edge 2650 Cluster SCI 4x4 2d torus scali MPI connect, scampi- 3.3.6-1.rhel3
Scaliwag GigE IBM AMD Opteron 2.0GHz 32 1 32 scaliwag Gigabit Ethernet / netgear GS748T scali MPI  4.3-7
Scaliwag SCI IBM AMD Opteron 2.0GHz 32 1 32 scaliwag Wulfkit's Scalable Coherent Interface (SCI) scali MPI 4.3-7
Scaliwag IB IBM AMD Opteron 2.0GHz 32 1 32 scaliwag Infiniband (mellanox) scali MPI 4.3-7

Click on graph for raw data

Click on graph for raw data

Click on graph for raw data

Click on graph for raw data

Click on graph for raw data

Click on graph for raw data

Click on graph for raw data

Click on grah for raw data

For further information on the hpcc benchmark visit the HPC Challenge website:

http://icl.cs.utk.edu/hpcc/

 

back to top
Page last updated 17.01.2007 12:00
© STFC 2007
For more info about DisCo please contact Igor Kozin