Transmembrane topology and signal peptide prediction using dynamic bayesian networks

PLoS Comput Biol. 2008 Nov;4(11):e1000213. doi: 10.1371/journal.pcbi.1000213. Epub 2008 Nov 7.

Abstract

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Artificial Intelligence
  • Bayes Theorem*
  • Computational Biology / methods*
  • Fungal Proteins / ultrastructure
  • Markov Chains
  • Membrane Proteins / ultrastructure*
  • Models, Molecular*
  • Neural Networks, Computer
  • Protein Conformation
  • Protein Sorting Signals / physiology*
  • Reproducibility of Results
  • Yeasts / ultrastructure

Substances

  • Fungal Proteins
  • Membrane Proteins
  • Protein Sorting Signals