| |
PDNS3D - Benchmarking a Direct Numerical Simulation Code
Mike Ashworth, STFC Daresbury Laboratory
We have investigated the performance across a wide-range
of parallel systems of the PDNS3D code -
a finite difference formulation of Direct Numerical
Simulation of Turbulence from the University of Southampton, UK
Fluid flows encountered in real applications are invariably turbulent.
There is, therefore, an ever-increasing need to understand turbulence
and, more importantly, to be able to model turbulent flows with improved
predictive capabilities. As computing technology continues to improve,
it is becoming more feasible to solve the governing equations of motion —
the Navier-Stokes equations — from first principles. The direct
solution of the equations of motion for a fluid, however, remains a
formidable task and simulations are only possible for flows with small to
modest Reynolds numbers. Within the UK,
the Turbulence Consortium (UKTC)
has been at the forefront of simulating turbulent flows by direct numerical
simulation (DNS). UKTC has developed a parallel version of a code to
solve problems associated with
shock/boundary-layer interaction.

Figure 1: The PDNS3D T3 (360x360x360) benchmark on the IBM p690 and p690+,
the SGI Altix 3700/1300 and 3700/1500, and the Cray T3E/1200E systems
(Click for full size image)
The code (SBLI) was originally developed for the Cray T3E and is a sophisticated
DNS code that incorporates a number of advanced features: namely high-order
central differencing; a shock-preserving advection scheme from the total variation
diminishing (TVD) family; entropy splitting of the Euler terms and the stable
boundary scheme. The code has been written using standard Fortran 90 code together
with MPI in order to be efficient, scalable and portable across a wide range of
high-performance platforms. The PDNS3D benchmark is a simple turbulent channel flow
benchmark using the SBLI code. Figure 1 shows performance results for the T3
benchmark data case (360x360x360) from the Phase1 system, the initial Phase2 system
and Phase2 following the SP7 upgrade. The performance shows scaling on all systems
which is close to ideal. The SP7 microcode upgrade clearly improves the scaling at
higher numbers of processors where the communications becomes significant, with a
10% improvement at 512 processors and 18% at 768. The Phase2 p690+ system is seen to
outperform the Phase1 p690 by a factor of 1.41 at 512 processors, a factor which is
somewhat greater than the clock speed ratio of 1.31.
The most important communications structure within PDNS3D is a halo-exchange
between adjacent computational sub-domains. Providing the problem size is
large enough to give a small surface area to volume ratio for each sub-domain,
the communications costs are small relative to computation and do not constitute
a bottleneck. We see almost linear scaling from all systems and in the case of
the Phase2 p690+ all the way out to 1280 processors.
Hardware profiling studies
of this code have shown that its absolute performance is highly dependent on the
cache utilisation and bandwidth to main memory.
This remains the subject of further study.
References
Application Performance on the High Performance Switch
Mike Ashworth, Ian J. Bush, Martyn F. Guest, Martin Plummer and Andrew G. Sunderland and Joachim Hein, 2004,
HPCx Technical Report HPCxTR0417
Single Node Performance Analysis of Applications on HPCx
Mark Bull, 2007,
HPCx Technical Report HPCxTR0703
|
|