George Hilliard, February 7, 2008
Identification of Proteins with Proteomics
This module deals with an introduction to proteomics database searching. The first goal will be to introduce how protein mass spectrometry (MS) is used to create MS data. Secondly, that data will be used in a few database searches to answer protein identity problems.
Unknown proteins contained in polyacrylamide gel pieces can be identified by proteolysis combined with mass spectrometry. The general strategy is to obtain a few proteolyzed fragments of each protein resolved in SDS-PAGE, determine the molecular weight of the few fragments by mass spectrometry, and search consolidated, non-redundant protein sequence databases for a match. This strategy first describe by numerous laboratories (1, 2, 3, 4, 5) has become known as peptide “mass fingerprinting".
Remember, peptides have a chemical formula and therefore molecular weights for the entire database can be calculated in silico. The search algorithms simply ask which entry in the database matches best to the measured or experimentally derived mass lists. The mass of a peptide is measured with an error compared to the calculated molecular weight. The magnitude of this error is a function of the performance of the mass spectrometer, and has a direct impact on the use of the MS data in a database search. The smaller the error, the more stringent the database search can be. You will see this effect in the database search engine software.
The Proteomics process in my Laboratory is summarized here.
These steps are schematically presented in the following figure. Steps A and B constitute the entry path for protein identification through peptide “mass fingerprinting”, and are the topic for today. This is a high throughput “batch” method of identifying the majority of a pool of unknown proteins, and has evolved into the method of choice for first pass analysis of unknown proteins. It is possible with high quality peptide “mass fingerprinting” data to unequivocally declare the match of an unknown protein to an entry in a consolidated, non-redundant protein database, including mammalian systems. This is due to the rapid increase in mass accuracy performance of mass spectrometers (very low mass error in parts-per-million) and the ability of search software to sort through mass spectrometry data from single protein spots that may be mixtures of two or more proteins. (6,7). The majority of a pool of unknown proteins is identified with peptide mass fingerprinting. Steps A and B of the scheme depicted in the figure will solve identities of 62-90% of all unknowns submitted from acrylamide gel protein arrays (cumulative, unpublished data).
Example 1 BSA
The first step is to start the search software
Next, let's look at the mass spectrum of a standard protein BSA. Input the mass list into the search software, and perform a database search. The search result should obviously be BSA, or bovine serum albumin, with high confidence. Are there any other proteins in the mass spectrum?
Example 2 Unknown Protein
A gel was run in the proteomics laboratory on bacterial protein mixture, and the proteins were stained with silver. The prominent band 17 was processed in the laboratory, and the resulting mass spectrum of Band 17 was obtained. The mass list from that spectrum is here. What is the identity of the bacterial protein?
Identify the unknown proteins
1. A 1D gel was run in the proteomics laboratory on a bacterial protein mixture, and the proteins were stained with silver in this gel image. The prominent band 8 was processed in the laboratory, and the resulting mass spectrum of band 8 was obtained. The mass list from that spectrum is here. What is the identity of the protein band? Are there other proteins present in the band?
2. A 2D gel was run in the proteomics laboratory on a fungal protein mixture, and the proteins were stained in this gel image. The prominent spot was processed in the laboratory, and the resulting mass spectrum of the unknown spot was obtained. The mass list from that spectrum is here. What is the identity of the protein spot? Are there other proteins present in the spot?
Access my Laboratory's web page here Protein Analysis and Proteomics Laboratory
References for Module 4
1. Henzel, W.J., Billeci, T.M., Stults, J.T., and Wong, S.C., 1993, Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases, Proc. Nat. Acad. Sci., USA 90, 5011-5015.
2. James, P., Quadroni, M., Carafoli, E., and Gonnet, G., 1993, Protein Identification by Mass Profile Fingerprinting, Biophys. Biochem. Res. Commun. 195, 58-64.
3. Mann, M., Hojrup, P., Roepstorff, P., 1993, Use of Mass Spectrometric Molecular Weight Information to Identify Proteins in Sequence Databases, Biol. Mass Spectrom. 22, 338-345.
4. Yates, J.R., Speicher, S., Griffin, P.R., and Hunkapillar, T., 1993, Peptide Mass Maps: A Highly Informative Approach to Protein Identification, Anal. Biochem. 214, 397-408.
5. Pappin, D.J.C., Hojrup, P., Bleasby, A.J., 1993, Rapid Identification of proteins by peptide-mass fingerprinting, Current Biology 3 (6) 327-332.
6. Jensen, O.N., Podtelejnikov, A., Mann, M., 1997, Identification of the Components of Simple Protein Mixtures by High-Accuracy Peptide Mass Mapping and Database Searching, Anal. Chem. 69, 4741-4750.
7. Shevchenko, A., Wilm, M., Vorm, O., and Mann, M., 1996, Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels, Anal. Chem. 68, 850-858.