YRC LOGO  
PROTEIN SEARCH:
Search: Descriptions Names [Advanced Search]
 

Structure Prediction Help Page

Interpreting Protein Structure Prediction Data

Deduced Domains
After the Ginzu domain prediction algorithm has exhausted its analysis of the protein sequence to predict protein domains, remaining stretches of the sequence may be designated as individual domains, with longer stretches being cut into separate domains based on length. This is the least confident of the domain prediction steps.

Back

Ginzu
Ginzu is a protocol that attempts to determine the regions of a protein chain that will fold into globular units, called "domains". It scans the protein chain sequence with successively less confident methods of detection to determine any homologs with experimentally determined structures, starting with PDB-BLAST (PSI-BLAST against the PDB), and followed by the more remote fold-detection methods ORFEUS and Pcons. After any homologs are identified, a search of remaining regions is done with HMMER against the Pfam-A protein family database. Lastly, the PSI-BLAST multiple sequence alignment is used to assign regions of increased likelihood of possessing a contiguous domain based on sequence clusters. The final step consists of selecting cut-points between the domains (and possibly defining new domains based on the strongest cutpoints for any remaining long stretches of the sequence that have not already matched a homolog with a structure or Pfam-A) using the PSI-BLAST MSA.

Back

MSA
Multiple Sequence Alignment (MSA) is used in the final and lowest-confidence sequence-based step of the Ginzu domain prediction algorithm. Analysis of the PSI-BLAST MSA is employed to predict domain cut points, based on the density of regions in the sequence alignments. Only confident cut predictions are shown.

Back

ORFEUS
ORFeus is a method for matching protein sequences to likely protein folds based on very remote sequence similarities, and is employed in the fold recognition step of the Ginzu domain prediction algorithm. The sequence profile and predicted secondary structures are searched against a database of sequence profiles and predicted secondary structures for proteins of known structure.

Results in the LiveBench, test of fold recognition methods suggest that scores of 7.5 or greater are almost always correct matches.

Back

Pcons
Pcons was the first consensus server for fold recognition and is used in the fold recognition step by the Ginzu domain prediction algorithm. It selects the best prediction out of several predictions. For each query sequence predictions from several fold recognition servers is collected. For each of these models a measure that relates to the quality of the model is calculated. The prediction of this new measure is accomplished by utilizing structural comparisons between the models and analyzing the server score for a particular model. Pcons makes at least 10% more correct predictions than the best single method and the specificity is significantly better.

Any Pcons score higher than 1.5 should be significant.

Back

PDB
Protein Data Bank is the world's protein structure data repository. We also refer to the file format from the Protein Data Bank used to describe a protein structure as a PDB.

Back

Pfam
Pfam is a set of families of protein sequences that are represented as hidden Markov models, and may be searched with HMMER.

Back

The confidence of Pfam matched domains is given by: -log(e-val), where the e-val returned by the search of the Pfam database using HMMER. Values of 3.0 (e = 0.001) or higher are considered significant.

Back

PSI-BLAST
PSI-BLAST is a method for detecting sequence homologs of a given protein. It uses the concept of searching with a position-specific residue substitution profile appropriate to the family in which the query belongs. This allows for more sensitive detection of remote homologous sequences.

Our domain prediction protocol, called Ginzu, scans the protein chain sequence with successively less confident methods of detection to determine any homologs with experimentally determined structures, starting with PSI-BLAST search against the PDB.

The confidence displayed is the -log(e-val), where the e-val is the value returned by a PSI-BLAST search against the PDB. A confidence of 3.0 (e = 0.001) is considered to be a strong detection threshold.

Back


YRC Informatics Platform - Version 3.0
Created and Maintained by: Michael Riffle