STFC Home Page STFC Home Page CSE Home Page CSE Home Page Computational Science & Engineering Department  

 16:23:41 BST
 Thursday
 02 September 2010

 Search the CSE web:
 Enter text and press return

 
  Home
  Support and services
  Research and development
  Advanced research computing
  Atomic and molecular physics
  Band theory
  CCP4 group
  Computational biology
  Computational chemistry
  Computational engineering
  Computational material science
  Numerical analysis
  Software engineering
  Visualization
  Online resources
  Events calendar
  Newsroom
  Site map / index
   

Valid HTML 4.01

Valid CSS!

 

SIC-LMTO - Benchmarking an Electronic Structure Code

John Ashby, STFC Rutherford Appleton Laboratory

We have investigated the performance across a wide-range of parallel systems of the SIC-LMTO code - a first-principles electronic band structure code from the Band Theory Group at Daresbury Laboratory.

Many of the interesting physical properties of materials are governed by the arrangement of electrons in the atoms of which they are composed. In crystalline materials these can include the crystalline structure, magnetic and chemical properties, electrical and thermal conductivity and many others. Understanding the interplay between electronic structure and physical properties informs the search for better and more useful materials. The computational solution of the underlying equations, the many-body Schrodinger equation for the electrons, is formidable, but computational and theoretical advances have made feasible the use of better and better approximations. 

Figure 1: Performance of the SIC-LMTO NiFe2O4inv benchmark
Figure 1: Performance of the SIC-LMTO NiFe2O4inv benchmark on HPCx, the SGI Altix, the Cray XD1 and Scarf (AMD Opteron cluster) systems

The SIC-LMTO code of Temmerman and Szotek is a self-consistent spin polarised calculation of the eectronic band structure of a crystalline material. It uses the linear muffin-tin orbitals approach with a Self-interaction correction and is written mostly in Fortran95, although there is some legacy code still in Fortran77. We had available datasets for a small problem (silver) and a large problem, the magnetic half metal NiFe2O4 in an inverse spinel structure. In this latter case the program treats 26 “atoms” (2 types of Fe, 1 Ni, 2 O and 4 empty spheres) with 98 bands leading to a hamiltonian matrix of dimension 234. This is diagonalised at 512 points within the Brillouin zone.

The essence of the program is the solution of the eigenproblem, H(k)ψk=Ekψk. The Hamiltonian H(k) depends on all the ψk through the electron density n(r)=Σall occupied states∣ψk (r)∣2 . Initially a guess is made at an electron density, the eigenproblem is solved and a new electron density generated. This is then fed back until self-consistency is reached. Within this self consistency loop each k-value can be solved for independently. The program is parallelised by farming out the k-points among the available processors and then performing a global broadcast of the results so that each processor can then calculate the electron density to use for the next iteration.

 Figure 1 shows performance results for the NiFe2O4inv benchmark data case on several systems. These were: The IBM P690 Regatta system HPCX, an SGI Altix and two similar AMD Opteron clusters, a Cray XD1 and SCARF, a cluster supplied by Streamline and using Myrinet connection technology. The Altix, XD1 and SCARF have similar scaling behaviour at low processor numbers (though the poor performance of the XD1 for 48 processors is anomalous but repeatable). In contrast HPCx displays poor scaling and a deterioration in performance at processor numbers above128. At the best going from 32 to 128 processors, a factor of 4, only doubles the speed. This can be traced back to the communication strategy employed. The global sums over k-points are performed in at best O(NlnN) messages. The message size is O(1/N) (the 512 k-points are divided up between the N processors) so even if there were no start up costs for a message, the communication cost would grow as lnN. The computational cost is decreasing as O(1/N) and eventually the increase in the communications cost will overtake the decrease in the computation. At 128 processors each processor is dealing with only 4 k-points before the communication phase.

The global communication strategy exacerbates the impact of load imbalance. The global sums have an implicit synchronisation point since they require all data to be available. Thus if one processor is taking longer than the others, the whole program is required to go at the speed of the slowest. We show this happening in Figure 2. Here we have used Vampir to produce a plot of the time spent by SIC-LMTO in one of its major routines and its subsidiaries, shown in grey, and in MPI calls shown in red. It is clear that some processes are spending 50% more time in MPI calls than others, not because they are sending more information but because they are waiting idle for the computationally slower routines. The computational load imbalance is shown by the grey histogram, and it is noticeable that this almost exactly mirrors the MPI imbalance. The double structure is an artefact of the use of two frames or LPARs in this 64-node run, the same load imbalance is being repeated on each LPAR.


Figure 2. Vampir plots of time SIC-LMTO spent in subroutine bands (grey) and in MPI calls (red).


 
 
   
Link Further information on the Band Theory Group
 

For more information about the Advanced Research Computing Group please contact Dr Mike Ashworth.
 
back to top
 
 ARC Quick links
Link ARC Home Page
Applications:
Link Castep
Link DL-POLY
Link FLITE3D
Link PDNS3D
Link POLCOMS
Link PRMAT
Link SIC-LMTO
Link THOR
Algorithms:
Link BFG
Link CLIPS
Link FFT
Link Eigensolvers
Benchmarking:
Link NWChem
Link JASPA
Link OCCOMM
Link DL-POLY
Languages:
Link Fortran 90
Link Inter-comparison
Link PGAS Languages
Link HPCS Languages
Tools etc.:
Link Vampir
Link Toolkits
Link QA software
Link GUI
People:
Link Mike Ashworth
Link Rob Allan
Link Stephen Pickles
Link Martin Plummer
Link Andrew Porter
Link Andrew Sunderland
Link Ilian Todorov
Past projects:
Link UKHEC Home Page