Figure 1. An outline of the ResProx algorithm. ResProx starts by assessing multiple parameters of protein quality using sub-programs such as VADAR (Willard et al. 2003), MolProbity (Chen et al. 2010), RosettaHoles (Sheffler and Baker 2009) and PROSESS (Berjanskii et al. 2010). The resulting quality scores are used to predict equivalent resolution with a support vector regression model, which was trained on a set of high-quality X-ray structures. Additionally, mean values and standard deviations of the quality parameters for a database of high-resolution structures are used to generate Z-scores, which are consequently converted to equivalent resolution value via a Z-Mean protocol. Finally, a decision making module selects one of the two equivalent resolution values as the final result, based on the difference between the predicted values and raw scores of protein quality.
Figure 2. Correlation between ResProx equivalent resolution and X-ray experimental resolution for the ResProx training and testing sets. A) Final ResProx values for the ResProx training set. B) Final ResProx values for the ResProx testing set. C) Z-Mean equivalent resolution for the ResProx training set. D) Z-Mean equivalent resolution for the ResProx testing set. E) SVR predictions for the ResProx training set. F) SVR predictions for the ResProx testing set. R and Err parameters indicate Pearson correlation coefficient and absolute mean error of resolution prediction, respectively.
Figure 3. Correlation between equivalent resolution and X-ray experimental resolution as calculated by Procheck-NMR, MolProbity, and RosettaHoles2. (A) Procheck-NMR equivalent resolution for the ResProx training set. (B) Procheck-NMR equivalent resolution for the ResProx testing set. (C) RosettaHoles2 SRESL equivalent resolution for the ResProx training set. (D) RosettaHoles2 SRESL for the ResProx testing set. (E) MolProbity score for the ResProx training set. (F) MolProbity score for the ResProx testing set. R and Err parameters indicate Pearson correlation coefficient and absolute mean error of resolution prediction, respectively.
Figure 4. Correlation between completeness of experimental information (distance restraints) and equivalent resolution of ubiquitin. (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Different measures of the completeness of the distance restraints was achieved by randomly removing 5 distance restraints from the total restraint set. Distance restraints consisted of NOE-based distance restraints and hydrogen bond distance restraints of the ubiquitin NMR ensemble 1D3Z.
Figure 5. Correlation between equivalent resolution and the ensemble precision of ubiquitin. (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Ensemble precision was assessed by calculating backbone RMSD of ubiquitin NMR ensembles with MolMol (Koradi et al. 1996). Spearman rank-order correlation coefficient is 0.95, 0.69, 0.84, and 0.90 for ResProx, Procheck-NMR, MolProbity, and RosettaHoles2, respectively.
Figure 6. Correlation of equivalent resolution with backbone proton chemical shifts (A) ResProx score. (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. The agreement between ubiquitin models and backbone proton chemical shifts was assessed by predicting the chemical shifts from different NMR models with ShiftX2 (Han et al. 2011) and calculating the mean absolute difference between predicted and experimentally measured chemical shifts. Spearman rank-order correlation coefficient is 0.95, 0.73, 0.85, and 0.95 for ResProx, Procheck-NMR, MolProbity, and RosettaHoles2, respectively.
Figure 7. Correlation between equivalent resolution of ubiquitin and the number of distance violations. (A) ResProx score (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score.
Figure 8. Correlation between the equivalent resolution of ubiquitin and model accuracy. (A) ResProx resolution (B) Procheck-NMR equivalent resolution. (C) RosettaHoles2 SRESL. (D) MolProbity score. Model accuracy was measured by calculating backbone RMSD of ubiquitin models with respect to the ubiquitin X-ray structure 1UBQ. NMR models of ubiquitin with different distance restraint violations were analyzed (see text for details).
Table 1. Correlation coefficients and mean absolute errors of ResProx, Procheck-NMR, MolProbity, and RosettaHoles2 for obsolete and current PDB entries of NMR structures..
Table 2. Improvements in the quality of water refined models - Comparison between ResProx values and DRESS Z-scores.
Table 3. Structure quality parameters used in the calculation of ResProx's equivalent resolution.
1 - Coefficient of correlation between the score and X-ray resolution for ResProx training set.
2 - This column specifies whether scores were used in its logarithm form ("Yes") or not ("No"). Star (*) indicates the scores, whose
logarithm was taken 16 times.
3,4 - Lower and upper bounds indicate the minimal and the maximal values, respectively, that scores were allowed to have in ResProx calculations.
5 - This column specifies whether a score Z-value was used for Z-Mean calculations and, if so, what score Z-value were considered: only positive, only negative, or both positive and negative (see text for more details).
6 - More information about scores can be found in corresponding publications and/or on websites of RosettaHoles (Sheffler and Baker 2009), PROSESS (Berjanskii et al. 2010), GeNMR(Berjanskii et al. 2009), and MolProbity (Chen et al. 2010; Davis et al. 2007).
7 - The percentages of bad bond lengths and bad bond angles are used only when their values exceed 4 standard deviatio
Figure 9. Resolution histogram of ResProx training/testing set. Proteins were grouped in 0.25Å bins. At least, 100 structures per resolution bin were placed in each bin, spanning the range between 1.0 Å and 3.75 Å.
Figure 10. Relationship between X-ray resolution and several ResProx protein quality scores for the ResProx training set. (A) Standard deviation of χ1 pooled from VADAR. (B) Clash Score from MolProbity; (C) Percent of <1% side-chain rotamer outliers from MolProbity.(D) RAMA score from GeNMR. (E) Ramachandran outliers from MolProbity. (F) RosettaHoles score. (G) Deviation of Kappa angles from PROSESS. (H) Percentage of disallowed Ω angles from VADAR.
Figure 11. Curve-fitting of a plot of X-ray resolution vs. average absolute Z score. Only the linear part of the plot, spanning the range of mean absolute Z-scores from 0 to 1.2 was used for curve-fitting. The curve-fitting was done with QtiPlot (Vasilief 2011).
Figure 12. GeNMR-based threshold for detecting poor-quality protein structures. The total GeNMR knowledge-based score, excluding radius of gyration score, is shown with blue diamonds for 50000 protein structures from the PDB. The solid line indicates selected threshold that separates 99.9% of the structures from a few poor-quality outliers.
Figure 13. Equivalent resolution of "intact" and "broken" models of obsolete NMR ensemble of the E. coli heme chaperone CcmE, 1LIZ. (A) "Intact" model 1 of 1LIZ. (B) "Broken" model 3 of 1LIZ. The misplaced Glu105 residue is colored green. Vectors of broken bonds between Glu105 and adjacent residues are shown with red lines. The figure was generated using MolMol (Koradi et al. 1996).
Figure 14. Histogram of ResProx equivalent resolution for NMR models and experimental resolution for X-ray structures. 500 NMR ensembles and 500 X-ray structures were randomly selected from the PDB.
Berjanskii M, Liang Y, Zhou J, Tang P, Stothard P, Zhou Y, Cruz J, MacDonell C, Lin G, Lu P, Wishart DS (2010) PROSESS: a protein structure evaluation suite and server. Nucleic Acids Res 38 (Web Server issue):W633-640
Berjanskii M, Tang P, Liang J, Cruz JA, Zhou J, Zhou Y, Bassett E, MacDonell C, Lu P, Lin G, Wishart DS (2009) GeNMR: a web server for rapid NMR-based protein structure determination. Nucleic Acids Res 37 (Web Server issue):W670-677
Chen VB, Arendall WB, 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66 (Pt 1):12-21
Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, Murray LW, Arendall WB, 3rd, Snoeyink J, Richardson JS, Richardson DC (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 35 (Web Server issue):W375-383
Koradi R, Billeter M, Wuthrich K (1996) MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 14 (1):51-55, 29-32
Lovell SC, Word JM, Richardson JS, Richardson DC (2000) The penultimate rotamer library. Proteins 40 (3):389-408
Sheffler W, Baker D (2009) RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci 18 (1):229-239
Vasilief I (2011) QtiPlot - Data Analysis and Scientific Visualisation. http://soft.proindependent.com/qtiplot.html, 0.9.8.4 edn.,
Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, Wishart DS (2003) VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res 31 (13):3316-3319