[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive]

Report on Experiences with Using SESP Tools and Code Conversion Issues

Graham Fletcher and Jens Thomas (2006) Computational Chemistry Group, Daresbury Laboratory

Section 1. Introduction

The Software Engineering Support Program (SESP) was formed in response to SLA Review Recommendation 7 (2004) that 'CSED should consider implementing minimum standards of design, documentation and testing to ensure that software distributed by Daresbury (and Rutherford Appleton) and used by the scientific community acquires a reputation of being of the highest quality'. High on the agenda of this recommendation is the need to address legacy code issues: a generic term for outdated software techniques not suited to code modularity, extensibility ('plug-n- play'), or general ease of maintenance.

In June 2006, members of CSED, including Graham Fletcher and Jens Thomas of the Computational Chemistry Group (CCG), met with Chris Greenough, head of SESP, to discuss how to move forward with the tools identified by SESP for use with CSED codes. The code of relevance to CCG here is GAMESS-UK. GAMESS-UK is a mixture of a number of different FORTRAN dialects (66, 77, 90 & 95), together with some C code. As a result, much of the code is "legacy code" that includes many non- standard language features and there are also many "machine-specific" pieces of code that were written to take advantage of particular computer architectures that may be long since obsolete.

As GAMESS-UK is still being developed and new features added to it, we are interested in modernising the code to take advantage of new features of the FORTRAN language and remove obsolete and unreliable code. A suitable strategy was discussed for transforming GAMESS-UK which involves tools that can automatically update any code to F90 or F95 once strict compliance with F77 has been achieved. SESP has identified several tools, mostly from the Numerical Algorithms Group (NAG), that can assist in all these areas.

At over a million lines of code, however, the full code was deemed too complex to submit to the tools directly. In addition, GAMESS-UK also makes heavy use of pre- processing to include or remove sections of the code for compilation, and none of the available tools are able to cope with code containing pre-processor directives. A smaller "kernel" program was therefore written, called SCFK, which contains some of the core code from GAMESS-UK and uses similar programming practices to the original code. SCFK computes a self-consistent field wave function so that any alterations can be checked by the correct execution of the program for an example case.

This kernel program was used to test the software transformation tools to see how feasible it would be to use them on GAMESS-UK itself.

Section 2. Experiences Using the Tools.

Here we provide a step-by-step account of using various NAG tools on SCFK. Details on what each tool does can be found on the SESP web site, http://www.sesp.cse.clrc.ac.uk/ .

  1. Starting with the NAG tools, we first tried nag_pfort and got an error with occurrence of the "!" symbol used as a comment (rather than a warning or statement).
  2. Next we found nag_decs to be useful and understandable except for a bug where DFLOAT gets declared as EXTERNAL rather than INTRINSIC under the - generify flag.
  3. At first we tried to use spag from the plusFort tools. This didn't work as it didn't appear to be able to deal with common blocks where the size of an array was specified in a parameter statement.
  4. We therefore tried to use the NAG tools, starting with nag_pfort. The problem with the f90 "!" syntax for a comment was only a minor annoyance, as a line of sed was able to fix this. The tool was then able to detect a number of minor errors with unused common blocks and ordering of statements that didn't adhere to the Fortran standard.

    The tool was only able to get through its first pass, but then failed on the second (where it checks things between subroutines) due to the way the memory allocation works in most electronic structure codes. As you probably know, in GAMESS-UK and its ilk, we allocate a chunk of memory using doubles and then pass this into subroutines sometimes as doubles, sometimes as integers, which seriously confuses the tools.

    It would be interesting to know whether any tool can deal with this, or whether the memory allocation stuff would have to be re-written in order to be able to take the code into F90 land (which would be a major task for GAMESS-UK, which is ultimately where we'd like to go with this).

  5. We then moved on to look at nag_decls, running it with the options: nag_decs -declare -ardicb -generify Then we hit the problem with DFLOAT already mentioned. There was also an identical problem with the LOC intrinisic, so we had to edit our own source file to change both of these ourselves.
  6. The next problem that we hit was that the tool declared data types for various quantities that had been implicitly declared, but with their values specified in parameter statements. The declarations were placed after the parameter statements and this caused the compiler to complain that there were two declarations. When this has been fixed by hand the file finally compiled.

Responses from Chris Greenough and David Worth are as follows:

It is clear that the code is an interesting mixture of a variety of Fortrans.

  1. Comments: "!" is not Fortran 77 so one might expect nag_pfort - a f77 tool - to complain - as Jens say easy enough to fix - but there might be problems if they are appended to the ends of executable statements.
  2. DFLOAT & LOC: neither of these are in the Fortran 77 standard as intrinsic function - so nag_decs will treat them as EXTERNALs. All references to DFLOAT should be replace by DBLE - which is both a Fortran 77 and Fortran 95 intrinsic. LOC is an aberration: Could you send us a short section of code showing how you use it. It clearly needs replacing with some form of interface to some suitable f90 - what - I don't know yet.
  3. spag and COMMON blocks: spag appeared to be fine with the test program - see test_nag_decs.f and test_nag_decs.spg. Are we missing something?
  4. nag_pfort: this checks against a slightly restricted set f77 that was thought to be portable - so this is stricter than f77. An example output would be useful. Most tools require reasonable input to start with. The memory management tricks are common in Fortran 77. Tools would need to reduce their type checking level for these constructs to pass through. Converting the Fortran 90/95 memory management is really the only sensible option - however as you say - time consuming.
  5. nag_decs: The -declare option only adds declarations for implicit variables but doesn't check for declarations via parameters. I would call this a bug in the tool. Without this flag all the declaration statements are rewritten and there is no problem. See test_nag_decs_new.f. DFLOAT and LOC are declared as EXTERNAL as you might expect.
  6. In terms of what to do next: clearly the idea is for you to use the tools in a process of slowly transforming the code to f95. As we might expect tools are not totally automatic. There still needs to be significant investment of people time.

Section 3. Code Conversion Activities.

Compliance of SCFK with F77 was also monitored by examining the output of the program FORCHECK. Based on the output of FORCHECK, the following types of conversion were found necessary:

Having completed the above conversions, the output from FORCHECK given below is obtained:

      F O R C H E C K  (R)  V13.6.1
 Copyright (c) 1984-2006  h.o. Forcheck. All rights reserved
 Licensed to: Prof C Greenough, CLRC, Rutherford Appleton Laboratory, Chilton, DIDCOT, U
K
 PC/Linux (), serial:  962026
 Limited to a single user

    -- LF95 compiler emulation

    -- program unit analysis

    -- file: scfk.f
       - program unit: SCFK
       - program unit: INIT2E
       - program unit: FOCK2E
       - program unit: RHFBLD
       - program unit: NORM
       - program unit: INTS1E
       - program unit: STVINT
       - program unit: RT123
       - program unit: ROOTS4
       - program unit: ROOTS5
       - program unit: ROOTSS
       - program unit: FFUN
       - program unit: ALL
       - program unit: DIAG
       - program unit: JACOBI
       - program unit: SORTQ
       - program unit: SHELLS
       - program unit: IJPRIM
       - program unit: GENRAL
       - program unit: XYZINT
       - program unit: SPDINT
       - program unit: TRNSFM
       - program unit: MULT2
       - program unit: SRTORB
       - program unit: SQ2TR
       - program unit: MEMALL
       - program unit: MEMFR
       - program unit: SETSQR
       - program unit: STTRIG
       - program unit: TRACEP

    -- reference structure

    -- global program analysis
   STTRIG, referenced in SCFK, argument no  2 (ITRG)
 **[573 E] data type inconsistent with specification
   SETSQR, referenced in SCFK, argument no  2 (ISQR)
 **[573 E] data type inconsistent with specification
   DIAG, referenced in SCFK, argument no  5 (TRIANG)
 **[573 E] data type inconsistent with specification
   DIAG, referenced in SCFK, argument no  6 (SQUARE)
 **[573 E] data type inconsistent with specification
   FOCK2E, referenced in SCFK, argument no  4 (TRIANG)
 **[573 E] data type inconsistent with specification
   TRNSFM, referenced in SCFK, argument no  5 (TRIANG)
 **[573 E] data type inconsistent with specification
   TRNSFM, referenced in SCFK, argument no  6 (SQUARE)
 **[573 E] data type inconsistent with specification
   /SHLNOS/, declared in RHFBLD
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   JACOBI, referenced in DIAG, argument no 12 (OMASK)
 **[573 E] data type inconsistent with specification
   JACOBI, referenced in DIAG, argument no 13 (IIPT)
 **[573 E] data type inconsistent with specification
   JACOBI, referenced in DIAG, argument no 14 (IPT)
 **[573 E] data type inconsistent with specification
   /SHLNOS/, declared in SHELLS
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   /SHLNOS/, declared in IJPRIM
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   /SHLNOS/, declared in GENRAL
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   /JUNK/, declared in XYZINT
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   /JUNK/, declared in SPDINT
 **[230 I] list of objects in named COMMON inconsistent with first declaration
   /SHLNOS/, declared in SPDINT
 **[230 I] list of objects in named COMMON inconsistent with first declaration

    -- messages presented:

     7x[230 I] list of objects in named COMMON inconsistent with first declaration
    10x[573 E] data type inconsistent with specification

 number of error messages:            10
 number of informative messages:       7

[END OF OUTPUT]

From the above output it can be seen that the remaining error messages are associated with coding practices such as the re-dimensioning of COMMON block arrays and the re-typing of variables between subroutine call and definition statements, traditionally regarded as essential measures to overcome serious limitations in strict F77 that have become widespread in scientific computing.

Together with the experiences of two other groups maintaining CSED codes (PDVR3D, from Walter Tennyson's group, and THOR (?)), the following conclusions were drawn:

Section 4. Legacy Practices.

So far, two general areas can be identified over which the conversion of scientific programs to strict F77 can encounter difficulties.

1. 'Non-standardizations'

These include the use of non-standard FORTRAN features (some of which were mentioned above) and coding 'techniques' employed by programmers to circumvent the limitations of the language.

  1. Compiler-supported features
    1. re-namings e.g. REAL*8 instead of DOUBLE PRECISION, DFLOAT instead of DBLE
    2. new types e.g. INTEGER*8
    3. intrinsic functions e.g. ISHIFT
    4. other features e.g. more than 9 continuation lines
  2. programming 'tricks'
    1. pre-processing
    2. 'simulating' dynamic memory allocation e.g. 'over-indexing' plus type redefinition (e.g. DBLE passed as INT or CHAR, etc).

    Some of the above issues are discussed in 'Features and Things to Avoid in F95!', L. S. Chin, C. Greenough, and D. J. Worth, Software Engineering Group Note SEG-N- 003.

2. Traditional problem areas in F77

There are numerous coding practices fully supported in F77 that are usually seen as unhelpful in the quest for greater modularity and extensibility. These include the use of 'GO TO's, alternate RETURN's, multiple loops on the same CONTINUE line, IMPLICIT declarations, and functions appearing indistinguishable from array elements, etc. Many of these can be addressed following conversion of the fully-F77- compliant code to f95 using the available tools (e.g. nag_cbm95, nag_chname95, nag_decs95, nag_struct95, etc). However, at least two legacy practices related to the use of COMMON blocks present a potentially tougher problem in the automatic conversion of code. These are:

  1. Re-naming of COMMON block variables
  2. Re-dimensioning of COMMON block arrays

With luck, the majority of legacy practices could be addressed by string-substitution, while traditional techniques for simulating dynamic memory (over-indexing plus data type redefinition) could be accomplished in a series of incremental transformations. Re-naming COMMON block variables may affect a relatively minor proportion of many codes and could be moderately automatable (see below) ? though choices may be required. However, since the re-dimensioning of arrays between different subroutines impacts the loop structures, automatic tools would need to handle modifications to the code semantics of those subroutines.

Below, the scale of such efforts applied to a widely used and fairly typical scientific software package in this arena is indicated.

Example package: GAMESS (US version)

In GAMESS, only a handful of key COMMON blocks are re-defined with regard to their contents, there is some minor use of pre-processing and non-standard namings. GAMESS maintains the 'dynamic memory' pool in an array simply called 'X' which is partitioned in driver routines before passing to subroutines that label the workspace more conveniently and do the computation. Therefore the occurrence of this array passed with an integer offset, or address, such as X(IPOINT) serves as rough a indicator of the scale of the problem associated with measures to simulate dynamic memory in older codes.

Line type Approximate number of lines (thousands)
Entire 770
Comments 70
Executable 700
Occurrences of over-indexing 40
Percentage 6

Thus, the bulk of legacy coding practices in GAMESS that may demand techniques currently unavailable are accounted for by the occurrence of 'X(I)' type statements, and these comprise somewhat less than 10% of the entire code.

While it may be feasible to consider converting key functionality in a piecemeal fashion, the task of converting entire packages such as those found in CSED would demand significant investment in resources.

Conclusion.

Many scientific packages, such as those used in CSE, are probably too complex to be readily converted or analysed by the software tools available to SESP. For instance, codes relying on pre-processing to include or remove sections of the code for compilation will find that none of the available tools are able to cope with pre- processor directives. Furthermore, it is hard to imagine such techniques as the simulation of dynamic memory, the renaming of common block variables, the re- dimensioning of common block arrays, nor indeed many of the numerous legacy practices that have been employed by scientists over the decades to circumvent the limitations of the languages then available, ever being converted in an automatic way to a more modern programming paradigm without significant investment in human time.

 

 
For more information about the work of the Computational Chemistry Group please contact Paul Sherwood p.sherwood@dl.ac.uk.

back to top

 Quick links
 
Software:
GAMESS-UK
DL_POLY
DL_MESO
DL_MULTI
DMAREL

ChemShell

CHARMM
/GAMESS-UK

The CCP1 GUI

DL-FIND

 
Collaborations:
CCP1
CCP5
Modelling Superoxide Dismutases
Crystal Polymorph Prediction
Modelling powders
Diffusion at Grain Boundaries
QUASI
 
Reports:
SESP
 
People:
Paul Sherwoood
Bill Smith
Laurence Ellison
John Purton
C.W. Yong
Michael Seaton
Tom Keal
Rick Anderson
Sebastian Metz
Ilian Todorov