CCP4 CCP4 Study Weekend

NAME: Fei Long

AFFILIATION: University of York, UK

CONTACT: fei@ysbl.york.ac.uk

 

TITLE: "Complete automation of molecular replacement in BALBES"

ABSTRACT:Long F, Vagin AA and Murshudov GN - York Structural Biology Laboratory, Chemistry Department, University of York, York, YO10 5YW, UK

 

The number of entries in the Protein data bank (PDB) is increasing every year. It has many implications and one of them is to Macromolecular crystallography. Now many structures can be solved using Molecular replacement (MR) techniques. Analysis of the PDB for last two years shows that more than 67% of all the deposited structures reported to be solved by this technique. With better algorithms and organisation of data bank it can be expected that this number will be substantially higher.

 

This talk will describe a complete automation of molecular replacement in BALBES. It includes three components that are essential to automatic molecular replacement. These are: 1) portable knowledgebase designed for efficient candidate model search for molecular replacement, 2) the state of the art scientific programs such as MOLREP, REFMAC, and SFCHECK, 3) a manager written in PYTHON that makes decision using data and uses different protocols. One major difference of BALBES from similar automated MR packages is that it is self-contained. Once BALBES is installed (it requires ccp4), it does depend on availability of such as things as Internet. Moreover, BALBES tries to minimise user intervention.

 

Basic components of BALBES are:

1) Organisation of knowledgebase of proteins. All entries in the PDB have been analysed and only non-redundant set of protein structures were stored. I.e. if sequence identity of two proteins were more than 90% and their 3D structures were close then one of them was removed. All remaining entries were analysed and if domains are present then information about them was stored. For most entries oligemeric states of the structures also were organised. Hierarchical database according to sequence identities was organised. It means that search for similar structure is very fast (less than 10 seconds). Moreover the search returns all information (domain organisation, tertiary structure etc) about similar structures.

2) Scientific programs used in the system are constantly improved using tests based on this system. We have already added several new molecular replacement techniques to increase this technique’s efficiency.

3) Automatic molecular replacement system was designed using high level scripting language - PYTHON. The system requires only available experimental data - sequence of the protein under study and the reflection data. The system searches the knowledgebase and extracts all potential candidate structures. It also analyses the experimental data and makes such decisions as resolution limit to be used, existence of pseudo-translation, twinning. The system uses prepared candidate structures and information about the experimental data and starts molecular replacement. All candidate structures step by step are tried using several protocols until satisfactory solution is found. A few current protocols include: 1) simple molecular replacement 2) use of domains iterated with refinement 3) use of tertiary structure if available 4) completion molecular replacement solution using refinement and phased molecular replacement.

 

Knowledgebase within BALBES is updated every 15 days (once a week in the future). All newly deposited data are regularly tested using BALBES and the previous version of knowledgebase. These tests show that the success rate is stable at 75%. All remaining failed cases are regularly analysed and the system is improved accordingly. We expect that in future more than 80% of structures will be solved completely automatically without any user intervention.

 

References

 

Murshudov GN, Vagin AA, Lebedev A, Wilson KS, Dodson EJ. Efficient anisotropic refinement of macromolecular structures using FFT. Acta Cryst. 1999;D55:247-255 Vagin, AA and Teplyakov A. MOLREP: an automated program for molecular replacement. J.Appl.Cryst. 1997;30:1022;1025