[an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive]
The Software Engineering Support Program (SESP) was formed in response to SLA Review Recommendation 7 (2004) that 'CSED should consider implementing minimum standards of design, documentation and testing to ensure that software distributed by Daresbury (and Rutherford Appleton) and used by the scientific community acquires a reputation of being of the highest quality'. High on the agenda of this recommendation is the need to address legacy code issues: a generic term for outdated software techniques not suited to code modularity, extensibility ('plug-n- play'), or general ease of maintenance.
In June 2006, members of CSED, including Graham Fletcher and Jens Thomas of the Computational Chemistry Group (CCG), met with Chris Greenough, head of SESP, to discuss how to move forward with the tools identified by SESP for use with CSED codes. The code of relevance to CCG here is GAMESS-UK. GAMESS-UK is a mixture of a number of different FORTRAN dialects (66, 77, 90 & 95), together with some C code. As a result, much of the code is "legacy code" that includes many non- standard language features and there are also many "machine-specific" pieces of code that were written to take advantage of particular computer architectures that may be long since obsolete.
As GAMESS-UK is still being developed and new features added to it, we are interested in modernising the code to take advantage of new features of the FORTRAN language and remove obsolete and unreliable code. A suitable strategy was discussed for transforming GAMESS-UK which involves tools that can automatically update any code to F90 or F95 once strict compliance with F77 has been achieved. SESP has identified several tools, mostly from the Numerical Algorithms Group (NAG), that can assist in all these areas.
At over a million lines of code, however, the full code was deemed too complex to submit to the tools directly. In addition, GAMESS-UK also makes heavy use of pre- processing to include or remove sections of the code for compilation, and none of the available tools are able to cope with code containing pre-processor directives. A smaller "kernel" program was therefore written, called SCFK, which contains some of the core code from GAMESS-UK and uses similar programming practices to the original code. SCFK computes a self-consistent field wave function so that any alterations can be checked by the correct execution of the program for an example case.
This kernel program was used to test the software transformation tools to see how feasible it would be to use them on GAMESS-UK itself.
Here we provide a step-by-step account of using various NAG tools on SCFK. Details on what each tool does can be found on the SESP web site, http://www.sesp.cse.clrc.ac.uk/ .
The tool was only able to get through its first pass, but then failed on the second (where it checks things between subroutines) due to the way the memory allocation works in most electronic structure codes. As you probably know, in GAMESS-UK and its ilk, we allocate a chunk of memory using doubles and then pass this into subroutines sometimes as doubles, sometimes as integers, which seriously confuses the tools.
It would be interesting to know whether any tool can deal with this, or whether the memory allocation stuff would have to be re-written in order to be able to take the code into F90 land (which would be a major task for GAMESS-UK, which is ultimately where we'd like to go with this).
Responses from Chris Greenough and David Worth are as follows:
It is clear that the code is an interesting mixture of a variety of Fortrans.
Compliance of SCFK with F77 was also monitored by examining the output of the program FORCHECK. Based on the output of FORCHECK, the following types of conversion were found necessary:
Having completed the above conversions, the output from FORCHECK given below is obtained:
F O R C H E C K (R) V13.6.1 Copyright (c) 1984-2006 h.o. Forcheck. All rights reserved Licensed to: Prof C Greenough, CLRC, Rutherford Appleton Laboratory, Chilton, DIDCOT, U K PC/Linux (), serial: 962026 Limited to a single user -- LF95 compiler emulation -- program unit analysis -- file: scfk.f - program unit: SCFK - program unit: INIT2E - program unit: FOCK2E - program unit: RHFBLD - program unit: NORM - program unit: INTS1E - program unit: STVINT - program unit: RT123 - program unit: ROOTS4 - program unit: ROOTS5 - program unit: ROOTSS - program unit: FFUN - program unit: ALL - program unit: DIAG - program unit: JACOBI - program unit: SORTQ - program unit: SHELLS - program unit: IJPRIM - program unit: GENRAL - program unit: XYZINT - program unit: SPDINT - program unit: TRNSFM - program unit: MULT2 - program unit: SRTORB - program unit: SQ2TR - program unit: MEMALL - program unit: MEMFR - program unit: SETSQR - program unit: STTRIG - program unit: TRACEP -- reference structure -- global program analysis STTRIG, referenced in SCFK, argument no 2 (ITRG) **[573 E] data type inconsistent with specification SETSQR, referenced in SCFK, argument no 2 (ISQR) **[573 E] data type inconsistent with specification DIAG, referenced in SCFK, argument no 5 (TRIANG) **[573 E] data type inconsistent with specification DIAG, referenced in SCFK, argument no 6 (SQUARE) **[573 E] data type inconsistent with specification FOCK2E, referenced in SCFK, argument no 4 (TRIANG) **[573 E] data type inconsistent with specification TRNSFM, referenced in SCFK, argument no 5 (TRIANG) **[573 E] data type inconsistent with specification TRNSFM, referenced in SCFK, argument no 6 (SQUARE) **[573 E] data type inconsistent with specification /SHLNOS/, declared in RHFBLD **[230 I] list of objects in named COMMON inconsistent with first declaration JACOBI, referenced in DIAG, argument no 12 (OMASK) **[573 E] data type inconsistent with specification JACOBI, referenced in DIAG, argument no 13 (IIPT) **[573 E] data type inconsistent with specification JACOBI, referenced in DIAG, argument no 14 (IPT) **[573 E] data type inconsistent with specification /SHLNOS/, declared in SHELLS **[230 I] list of objects in named COMMON inconsistent with first declaration /SHLNOS/, declared in IJPRIM **[230 I] list of objects in named COMMON inconsistent with first declaration /SHLNOS/, declared in GENRAL **[230 I] list of objects in named COMMON inconsistent with first declaration /JUNK/, declared in XYZINT **[230 I] list of objects in named COMMON inconsistent with first declaration /JUNK/, declared in SPDINT **[230 I] list of objects in named COMMON inconsistent with first declaration /SHLNOS/, declared in SPDINT **[230 I] list of objects in named COMMON inconsistent with first declaration -- messages presented: 7x[230 I] list of objects in named COMMON inconsistent with first declaration 10x[573 E] data type inconsistent with specification number of error messages: 10 number of informative messages: 7 [END OF OUTPUT]
From the above output it can be seen that the remaining error messages are associated with coding practices such as the re-dimensioning of COMMON block arrays and the re-typing of variables between subroutine call and definition statements, traditionally regarded as essential measures to overcome serious limitations in strict F77 that have become widespread in scientific computing.
Together with the experiences of two other groups maintaining CSED codes (PDVR3D, from Walter Tennyson's group, and THOR (?)), the following conclusions were drawn:
So far, two general areas can be identified over which the conversion of scientific programs to strict F77 can encounter difficulties.
These include the use of non-standard FORTRAN features (some of which were mentioned above) and coding 'techniques' employed by programmers to circumvent the limitations of the language.
Some of the above issues are discussed in 'Features and Things to Avoid in F95!', L. S. Chin, C. Greenough, and D. J. Worth, Software Engineering Group Note SEG-N- 003.
There are numerous coding practices fully supported in F77 that are usually seen as unhelpful in the quest for greater modularity and extensibility. These include the use of 'GO TO's, alternate RETURN's, multiple loops on the same CONTINUE line, IMPLICIT declarations, and functions appearing indistinguishable from array elements, etc. Many of these can be addressed following conversion of the fully-F77- compliant code to f95 using the available tools (e.g. nag_cbm95, nag_chname95, nag_decs95, nag_struct95, etc). However, at least two legacy practices related to the use of COMMON blocks present a potentially tougher problem in the automatic conversion of code. These are:
With luck, the majority of legacy practices could be addressed by string-substitution, while traditional techniques for simulating dynamic memory (over-indexing plus data type redefinition) could be accomplished in a series of incremental transformations. Re-naming COMMON block variables may affect a relatively minor proportion of many codes and could be moderately automatable (see below) ? though choices may be required. However, since the re-dimensioning of arrays between different subroutines impacts the loop structures, automatic tools would need to handle modifications to the code semantics of those subroutines.
Below, the scale of such efforts applied to a widely used and fairly typical scientific software package in this arena is indicated.
In GAMESS, only a handful of key COMMON blocks are re-defined with regard to their contents, there is some minor use of pre-processing and non-standard namings. GAMESS maintains the 'dynamic memory' pool in an array simply called 'X' which is partitioned in driver routines before passing to subroutines that label the workspace more conveniently and do the computation. Therefore the occurrence of this array passed with an integer offset, or address, such as X(IPOINT) serves as rough a indicator of the scale of the problem associated with measures to simulate dynamic memory in older codes.
|Line type||Approximate number of lines (thousands)|
|Occurrences of over-indexing||40|
Thus, the bulk of legacy coding practices in GAMESS that may demand techniques currently unavailable are accounted for by the occurrence of 'X(I)' type statements, and these comprise somewhat less than 10% of the entire code.
While it may be feasible to consider converting key functionality in a piecemeal fashion, the task of converting entire packages such as those found in CSED would demand significant investment in resources.
Many scientific packages, such as those used in CSE, are probably too complex to be readily converted or analysed by the software tools available to SESP. For instance, codes relying on pre-processing to include or remove sections of the code for compilation will find that none of the available tools are able to cope with pre- processor directives. Furthermore, it is hard to imagine such techniques as the simulation of dynamic memory, the renaming of common block variables, the re- dimensioning of common block arrays, nor indeed many of the numerous legacy practices that have been employed by scientists over the decades to circumvent the limitations of the languages then available, ever being converted in an automatic way to a more modern programming paradigm without significant investment in human time.
|For more information about the work of the Computational Chemistry Group please contact Paul Sherwood firstname.lastname@example.org.|
back to top
|Modelling Superoxide Dismutases|
|Crystal Polymorph Prediction|
|Diffusion at Grain Boundaries|