STFC Home Page STFC Home Page CSE Home Page CSE Home Page Computational Science & Engineering Department  

 05:38:22 GMT
 Friday
 12 March 2010

 Search the CSE web:
 Enter text and press return

 
  Home
  Support and services
  Research and development
  Advanced research computing
  Atomic and molecular physics
  Band theory
  CCP4 group
  Computational biology
  Computational chemistry
  Computational engineering
  Computational material science
  Numerical analysis
  Software engineering
  Visualization
  Online resources
  Events calendar
  Newsroom
  Site map / index
   

Valid HTML 4.01

Valid CSS!

 

A Parallel Implementation of SPME for DL_POLY 3

I.J. Bush and W. Smith
STFC Daresbury Laboratory

In DL_POLY [1] the evaluation of the Coulomb potential and forces is performed using the smooth particle mesh Ewald (SPME) algorithm [2]. As in all Ewald [3] methods this splits the calculation into two parts, one performed in real space and one in Fourier space. The former only requires evaluation of short ranged functions, which fits in well with the domain decomposition used by DL_POLY 3, and so scales well with increasing processor count. However the Fourier component requires 3 dimensional FFTs to be performed. These are global operations and so a different strategy is required if good scaling is to be achieved.

The original implementation involved replicating the whole FFT grid on all processors and performing the FFTs in serial after which each processor could evaluate the appropriate terms for the atoms that it held. This method has a number of drawbacks

  • The FFT grid may be very large. A 256x256x256 grid would require 256 Mbytes of data to be held on each processor.
  • The replication of such large grids is an expensive operation in itself, the cost of which increases with processor count
  • Doing the FFT in serial is potentially a bottleneck, especially at large numbers of processors

While both open source 3D parallel FFTs (such as FFTW [4]) and proprietary ones (such as Cray's PCCFFT) exist, unfortunately they do not quite address all the points above. The problem is that they impose a data distribution, typically planes of points, that is incompatible with DL_POLY's spatial domain decomposition, so while a complete replication of the data is not required, it is still necessary to perform extensive data redistribution which will limit the scaling of the method.

To address these limitations, a parallel 3D FFT has been written by Ian Bush which maps directly onto DL_POLY's data distribution. This involved parallelizing the individual 1D FFTs in an efficient manner, and while the method will be slower than the proprietary routines for small processor counts, at large numbers it is an attractive method, since:

  • The method, while moving more data in total, requires very much fewer messages. Thus in the latency dominated regime it should perform better.
  • Global operations, such as the all to all operations used in both FFTW and PCCFFT, are totally avoided.

More generally the method is extremely flexible, allowing a much more general data distribution than those of other FFTs, and as such should be useful in other codes which do not map directly onto a by planes distribution.

In Table 1 we compare the times for DL_POLY 3 using both the new method and the original. The system comprises 27,000 atoms of Sodium Chloride run for 200 time steps, where the FFT grid size is 64x64x64. The timings (seconds) are for the Cray T3E/1200E at Manchester; it can be seen that the new method, while not scaling perfectly, is a vast improvement on the original.

Processors Original Method Parallel FFT
4 1851.1 554.8
8 911.8 318.9
16 558.6 116.9
32 351.2 86.1
64 271.7 54.4
128 232.3 33.7
256 216.6 28.1

Table 1.

Table 2 shows the scaling of the new method for a larger NaCl system, this time 216,000 atoms for 10 time steps with a 128x128x128 FFT grid, again on the Cray T3E. For this example, which is more typical of a production run, the scaling can be seen to be excellent.

Processors Time/s
32 53.32
64 24.38
128 12.46
256 6.23

Table 2.

References

[1] W. Smith and T. Forester, J. Molec. Graphics, 14 136 (1996)

[2] Darden et al., J. Chem.Phys. 103 19 (1995)

[3] P.P. Ewald, Ann. Physik. 64 253 (1921)

[4] Frigo and Johnson, ICASSP Conference Proceedings (1998)

 
 
For more information about the Advanced Research Computing Group please contact Dr Mike Ashworth.
 
back to top
 
 ARC Quick links
Link ARC Home Page
Applications:
Link Castep
Link DL-POLY
Link FLITE3D
Link PDNS3D
Link POLCOMS
Link PRMAT
Link SIC-LMTO
Link THOR
Algorithms:
Link BFG
Link CLIPS
Link FFT
Link Eigensolvers
Benchmarking:
Link NWChem
Link JASPA
Link OCCOMM
Link DL-POLY
Languages:
Link Fortran 90
Link Inter-comparison
Link PGAS Languages
Link HPCS Languages
Tools etc.:
Link Vampir
Link Toolkits
Link QA software
Link GUI
People:
Link Mike Ashworth
Link Rob Allan
Link Stephen Pickles
Link Martin Plummer
Link Andrew Porter
Link Andrew Sunderland
Link Ilian Todorov
Past projects:
Link UKHEC Home Page