YRC Logo
PROTEIN SEARCH:
Descriptions Names[Advanced Search]

Download Data

Here you may download published and unpublished data released to the public domain by the Yeast Resource Center as bulk data downloads. Questions regarding unpublished data should be directed to the appropriate group within the YRC as describe here.

Jump to: Mass Spectrometry, Philius, Protein Complex Predictions, Protein Structure Prediction, Subcellular Localization / Fluorescence Microscopy, Yeast Two-Hybrid, Reference Table

Mass Spectrometry Data

ms_data.txt [47K]
This tab-delimited text file contains the following columns:
  • Run ID: An internal id number that allows for distinguishing between proteins identified in separate MS runs.
  • Bait Protein: The systematic name of the protein which was bait in the affinity puficiation experiment.
  • Hit Protein: The systematic name of the protein identified by mass spectrometry in the purification experiment.
  • Sequence Coverage: The percentage of the protein's sequence represented by the peptides identified in the MS run.
  • Sequence Count: The number of peptides that were used in the identification of the protein.
  • Spectrum Count: The total number of spectra found to correspond to the peptides used in the identification of the protein.
  • Reference ID: A number corresponding to the publication for which this data was produced, which is found in the reference table below.


Philius Transmembrane Predictions

The data are described in: Reynolds et al. (2008)

allPhiliusReports.tar.gz [206M]
This gzipped tar archive contains the Philius transmembrane and signal peptide predictions for ~6.33M protein sequences from approximately 139,000 distinct NCBI taxonomy ID numbers.
philius_seqs.fasta.gz [1.3G]
FASTA-formatted file containing a mapping of YRC sequence IDs (used in allPhiliusReports.tar.gz) to YRC protein IDs and standard database accession strings from NCBI, swiss-prot/uniprot, IPI and more. Also contains protein sequences for all proteins.


Protein Complex Predictions

Hue, M. et al. (2009) (Submitted):

Presented are all pairs of predicted interacting proteins referenced by their PDB identifiers and the associated confidence values.

These data support the following publication: Hue, M. et al. (2009) (Submitted)

Qiu J, et al. (2008):

Presented are matrices of false discovery rates derived from the likelihood that any two proteins from Saccharomyces cerevisiae (budding yeast) may be found together in a protein complex. The standard identifiers for the open reading frames encoding the proteins are found along the two axes.

These data support the following publication: Qiu J, et al. (2008)

ehyh_sg2M_C10_noDubPseu_fdr.mtx.gz [27M]
The set of predictions from a classifier trained without Gene Ontology terms.
gehyh_sg2M_C10_noDubPseu_fdr.mtx.gz [16M]
The set of predictions from a classifier trained with Gene Ontology terms.


Protein Structure and Domain Predictions

This is the raw, tab delimited data from Drew K, et al. (2011).

domainParseDump.tar [126.0M]
This file contains the following four data files:
  • pspProteins.fasta.gz - The proteins and their sequences, in FASTA format, for the proteins analyzed by this research project. The FASTA headers contain the internal YRC sequence ID and is referenced in other data files. The FASTA header descriptions contain the YRC protein IDs that correspond to the sequence ID.
  • domainTableDump.txt.gz - Information about the domains predicted for each of the proteins. This file contains the following columns:
    • Domain ID - Unique domain identification number.
    • YRC Sequence ID - Unique protein identification number.
    • Domain Number - Moving from the N to C terminus, the order that the start of this domain appears (is it the first, second, third, etc domain).
    • PDB ID - If this domain was determined by a PSI-BLAST match to a PDB, this is the match.
    • Score - The score used from the respective method that was used to identify this domain using GINZU.
    • Method - The method used during the GINZU domain prediction method that was used to predict this domain.
  • domainRegionTableDump.txt.gz - Information describing how the predicted domains map onto the protein sequences. This file contains the following columns:
    • Region ID - Unique region identification number.
    • Domain ID - The ID of the domain to which this region belongs.
    • Domain Segment - Domains may consist of discontiguous regions in the protein sequence. In this event, the first segment will be labeled as "A", the second as "B" and so forth.
    • Start Residue - The number of the start residue in the protein sequence for this region.
    • End Residue - The number of the end residue in the protein sequence for this region.
    • Description - Brief descriptive text describing this region.
  • decoyMatchesDump.txt.gz - Information describing which protein structure prediction was made for particular domains, as well as the quality of that predicted protein structure. This file contains the following columns:
    • Unique identification number - A unique number identifying the match of a domain to a predicted protein structure.
    • Domain ID - The ID of the domain whose structure was predicted.
    • MCM Score - The score describing the quality of this predicted structure's match to the predicted SCOP superfamily (see paper for details).
    • SCOP superfamily - The matching SCOP superfamily.
    • Decoy ID - The unique identifier of the predicted 3D structure for this domain (referencing the structure datafile below).
    • GO integration score - The score obtained after integrating the predicted structure with known Gene Ontology annotations for this protein (see paper for details).
    • GO accession ID - The accession ID of the Gene Ontology term used to derive the GO integration score.
pspProteinDescriptions.txt.gz [138M]
This file contains a mapping of YRC protein IDs to names and descriptions from common databases. Columns are:
  • Protein ID - Unique number identifying this protein.
  • Reference Source - The name of the source reference, such as NCBI or SGD.
  • Accession String - The accession string for this protein for this reference.
  • Description - The description of this protein for this reference.
decoyAtomRecordDump.txt.gz [5.4G]
This file contains the predicted structures for the domains from this project. The two columns are:
  • Decoy ID - Unique number identifying this structure.
  • Structure Data - The PDB file format 3D structure.


Subcellular Localization / Fluorescence Microscopy

These are the original O'Shea image data, in PNG format, from Huh WK, et al. (2003), originally found in the Yeast GFP fusion localization database. All of these data are also incorporated into the PDR and may be viewed in the web site by finding proteins of interest.

yeastGFPImagesChrI.tar [31M]
yeastGFPImagesChrII.tar [172M]
yeastGFPImagesChrIII.tar [57M]
yeastGFPImagesChrIV.tar [352M]
yeastGFPImagesChrV.tar [113M]
yeastGFPImagesChrVI.tar [47M]
yeastGFPImagesChrVII.tar [231M]
yeastGFPImagesChrVIII.tar [112M]
yeastGFPImagesChrIX.tar [90M]
yeastGFPImagesChrX.tar [151M]
yeastGFPImagesChrXI.tar [144M]
yeastGFPImagesChrXII.tar [214M]
yeastGFPImagesChrXIII.tar [212M]
yeastGFPImagesChrXIV.tar [178M]
yeastGFPImagesChrXV.tar [224M]
yeastGFPImagesChrXVI.tar [194M]


Yeast Two-Hybrid Data

y2h_data.txt [9.5K]
This download contains only double positives (yeast two-hybrid guidelines). This tab-delimited text file contains the following columns:
  • Screen ID: An internal id number that allows for distiniguishing between interactions identified in separate experiments.
  • Bait Protein: The systematic name of the bait protein.
  • Prey Protein: The systematic name of the prey protein.
  • Reference ID: A number corresponding to the publication for which this data was produced, which is found in the reference table available below.


Reference Table

reference_data.txt [1.7K]
This tab-delimited text file contains the following columns:
  • Reference ID: This is the unique number which specific this reference.
  • Citation: The citation of the reference.
  • URL: The URL for accessing the reference online.

YRC Informatics Platform - Version 3.0
Created and Maintained by: Michael Riffle