NAME: Lukasz Jaroszewski
AFFILIATION: The Burnham Institute for Medical Research, USA
CONTACT: lukasz@burnham.org
TITLE: "JCSG MR-Pipeline: multiple alignments, models and parallel MR searches"
ABSTRACT: The MR success rate drops when models share less than 30% sequence identity with their templates, but can be significantly improved by using advanced homology recognition methods and multiple MR searches with alternative search models and parameters. Models based on fold recognition algorithms are more accurate than models based on conventional alignment methods like FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement in a parallel process, one can effectively increase the success rate of MR. The JCSG MR pipeline was used to solve more than 25 MR structures with less than 30% sequence identity to the template. By using several difficult MR problems as examples we demonstrate that successful MR phasing is possible even in cases when the similarity between the model and the template can only be detected with fold recognition algorithms.
For remotely homologous structures, the preparation of search models starts with the analysis of structural conservation patterns in the protein family. In the first step, several search models are built based on all homologous structures found in the PDB by fold recognition algorithms. Structurally variable regions of this family are then identified based on a multiple alignment of known structures from that family and appropriate truncations of the models are proposed. All combinations of proposed truncations are applied to the search models. The models resulting from this process are used in parallel MR searches with different input parameter combinations for MR phasing and refinement programs. The putative solutions are identified based on the final values of free R-factor, figure of merit, and deviations from ideal geometry after automated refinement. Finally, crystal packing and electron density maps are checked to identify the correct solution which is then refined.
We estimate that with the improvements in model building and parallel searching using existing phasing algorithms, MR can be successful for about 50% of recognizable homologues of known structures below the threshold of 30% sequence identity. This implies that about one third of the proteins in a typical bacterial proteome are potential MR targets (see Figure 1).

Figure 1: Current structural coverage of a bacterial proteome: the example of Thermotoga maritima. (orphans – proteins with no homologs in other organisms, transmembrane – proteins with predicted transmembrane helices, low complexity – proteins with long low complexity fragments)