The HPC Challenge benchmark suite is made up of the following components: |
G-HPL ( system performance )
|
HPL, Solves a randomly generated dense linear system of equations in double floating-point precision (IEEE 64-bit) arithmetic using MPI. The linear system matrix is stored in a two-dimensional block-cyclic fashion and multiple variants of code are provided for computational kernels and communication patterns. The solution method is LU factorization through Gaussian elimination with partial row pivoting followed by a backward substitution. Unit: Tera Flops per Second |
G-PTRANS (A=A+B^T, MPI) ( system performance )
|
| PTRANS (A=A+B^T, MPI), Implements a parallel matrix transpose for two-dimensional block-cyclic storage. It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second |
G-RandomAccess ( system performance )
|
| Global RandomAccess, also called GUPs, measures the rate at which the computer can update pseudo-random locations of its memory - this rate is expressed in billions (giga) of updates per second (GUP/s). Unit: Giga Updates per Second |
EP-STREAM Triad ( per process )
|
The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Bytes per Second
|
G-STREAM Triad ( system performance - derived )
|
The Embarrassingly Parallel STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple numerical vector kernels. It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is multiplied by the number of processes to attain this derived value. Unit: Giga Bytes per Second |
EP-DGEMM ( per process )
|
Embarrassingly Parallel DGEMM, benchmark measures the floating-point execution rate of double precision real matrix-matrix multiply performed by the DGEMM subroutine from the BLAS (Basic Linear Algebra Subprograms). It is run in embarrassingly parallel manner - all computational nodes perform the benchmark at the same time, the arithmetic average rate is reported. Unit: Giga Flops per Second |
G-FFTE ( system performance )
|
Global FFTE, performs the same test as FFTE but across the entire system by distributing the input vector in block fashion across all the nodes. Unit: Giga Flops per Second |
Randomly Ordered Ring Bandwidth ( per process )
|
Randomly Ordered Ring Bandwidth, reports bandwidth achieved in the ring communication pattern. The communicating nodes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator). The result is averaged over various random assignments of processes in the ring. Unit: Giga Bytes per second |
Randomly-Ordered Ring Latency ( per process )
|
Randomly-Ordered Ring Latency, reports latency in the ring communication pattern. The communicating nodes are ordered randomly in the ring (with respect to the natural ordering of the MPI default communicator) in the ring. The result is averaged over various random assignments of processes in the ring. Unit: micro-seconds |
| Machine ID |
Processor Type |
Proc Speed |
Proc Count |
Threads |
Processes |
System Name |
Interconnect |
MPI |
| Cray X1 |
Cray X1 MSP |
0.8GHz |
64 |
1 |
64 |
X1 |
Cray modified 2D Torus |
Cray MPT 2.2 |
| Cray XD1 |
AMD Opteron |
2.2GHz |
64 |
1 |
64 |
XD1 |
Rapid Array Interconnect System |
MPI over rapid array |
| Dalco QsNetII Cluster |
AMD Opteron |
2.2GHz |
64 |
1 |
64 |
Opteron / QsNet Linux Cluster |
QsNet II |
Quadrics qsnetmpi 1.24-39 |
| IBM Power 4 p690 |
IBM Power4 |
1.3GHz |
64 |
1 |
64 |
p690 |
Colony |
POE 3.2 |
| IBM Power 4+ |
IBM Power 4+ |
1.7GHz |
64 |
1 |
64 |
e-server pSeries 655 |
HPS (IBM High Performance Switch) |
PE 4.1 |
| NEC SX-6 |
NEC SX6 |
0.5GHz |
64 |
1 |
64 |
SX-6 |
Internode Crossbar Switch |
MPI/SX 6.7.5 |
| SGI Altix Bx2 |
Intel Itanium 2 |
1.6GHz |
64 |
1 |
64 |
Altix 3700 Bx2 |
N/A |
SGI MPT 1.12 |
| Sun V20z Opteron |
AMD Opteron |
2.2GHz |
64 |
1 |
64 |
Sun Fire V20z Cluster |
Gigabit Ethernet Cisco 6509 switch |
LAM/MPI 7.1.1 |
| Dell EM64T Cluster |
Intel Xeon EM64T |
3.4GHz |
64 |
1 |
64 |
Power Edge 1850 Cluster |
Infiniband |
scali MPI connect, scampi- 3.3.4-8.rhel3 |
| Scaliwag GigE |
AMD Opteron |
2.0GHz |
64 |
1 |
64 |
scaliwag |
Gigabit Ethernet / netgear GS748T |
scali MPI 4.3-7 |
| Scaliwag SCI |
AMD Opteron |
2.0GHz |
64 |
1 |
64 |
scaliwag |
Wulfkit's Scalable Coherent Interface (SCI) |
scali MPI 4.3-7 |
| Scaliwag IB |
AMD Opteron |
2.0GHz |
64 |
1 |
64 |
scaliwag |
Infiniband (mellanox) |
scali MPI 4.3-7 |
| Cray XD1 |
AMD Opteron |
2.2GHz |
64 |
1 |
64 |
csexd1 |
Rapid Array Interconnect System |
MPI over rapid array |