Data Collection and Analysis for High Throughput Quantitative Proteomics Current Status and Challenges

资源描述

,Data Collection and Analysis for High Throughput Quantitative Proteomics:Current Status and Challenges,Ruedi Aebersold, Ph.D.,Institute for Systems Biology,Seattle, Washington,Proteomics,:,The systematic (quantitative) analysis,of the proteins expressed in a cell at a time,Enumerate all the,components,of a proteome,Proteome as,database:,Proteome analyzed,once,Detect dynamic,changes in proteome,following external or,internal perturbations,Proteomics as,Biol. or clin. assay,:,Proteome analyzed,multiple (infinite),times,Q2,Collision Cell,Q3,I,II,III,Correlative sequence database searching,Theoretical,Acquired,Protein identification,Peptides,1D, 2D, 3D peptide separation,200,400,600,800,1000,1200,m/z,200,400,600,800,1000,1200,m/z,200,400,600,800,1000,1200,m/z,12,14,16,Time (min),Tandem mass spectrum,Protein Identification Strategy,Q1,*,*,Protein,mixture,Accurate Quantitation Using Isotope Dilution,h/l analytes are chemically identical, identical specific signal in MS,Sample 1,Sample 2,(Reference),Incorporate,Stable Light,Isotope,Incorporate,Stable Heavy,Isotope,Analyze by Mass Spectrometer,Combine Samples,Ratio of h/l signals indicates ratio of analytes,Isotope Coded Affinity Tags (ICAT),Heavy reagent: d8-ICAT (,X,=deuterium),Light reagent: d0-ICAT (,X,=hydrogen),Biotin tag,Linker (heavy or light),S,N,N,O,N,O,O,O,N,I,O,O,X,X,X,X,X,X,X,X,Thiol,reactive,Detection of Cys containing peptides and,accurate quantification using stable isotope dilution,Quantitative proteomics by isotope labeling-LC-MS/MS,550,560,570,580,m/z,0,100,Light,Heavy,Mixture 1,Quantitation and protein identification,200,400,600,800,m/z,0,100,NH,2,-EACDPLR-,COOH,Combine and proteolyze,Avidin affinity enrichment,isotope-label,Mixture 2,Optional fractionation,Compatible with any separation/fractionation,method at protein/peptide level.,Metabolic stable,isotope labeling,Isotope tagging,by chemical reaction,Digest,Label,Stable isotope incorporation,via enzyme reaction,PROTEIN LABELING,DATA COLLECTION,DATA ANALYSIS,Mass spectrometry,I,ntensity,Intensity,I,ntensity,m/z,m/z,m/z,Digest,Digest,Stable Isotope Labeling Strategies,Quantitative Proteomics Technology,Protein identification,: Automated peptide tandem mass spectrometry of complex peptide mixtures,Protein quantification,: Isotope dilution,Selective chemical reactions,: reduction of sample complexity; selective analyte isolation,Results,Identification of proteins in sample and quantitative profiles,Quantitative Proteomics Technology,Protein identification,: Automated peptide tandem mass spectrometry of complex peptide mixtures,Protein quantification,: Isotope dilution,Selective chemical reactions,: reduction of sample complexity; selective analyte isolation,Results,Identification of proteins in sample and quantitative profiles,Current capacity: 1000 proteins per day/instrument,Total yeast lysate: 2000 proteins identified and quantified,Quantitative Proteomics Technology,Protein identification,: Automated peptide tandem mass spectrometry of complex peptide mixtures,Protein quantification,: Isotope dilution,Selective chemical reactions,: reduction of sample complexity; selective analyte isolation,Results,Identification of proteins in sample and quantitative profiles,Current capacity: 1000 proteins per day/instrument,Total yeast lysate: 2000 proteins identified and quantified,In 1991, all the worlds labs combined had identified just about 2000 genes,Current Limitations,(and Potential Solutions),The efficiency problem,The validation problem,The biological inference problem,Standard Method for Complex Peptide Mixture Analysis,Cation Exchange,RP-HPLC,ESI-MS/MS,Proteome Analysis: The Analytical Challenges,Yeast Proteome,Expected number of ORFs: 6118,Expected number of tryptic peptides: 350,000,Synchronous Timepoint SamplesCompared to Reference Sample,Timepoint Samples from,Yeast Cells Synchronously,Transiting the Cell Cycle,Asynchronous,Reference,Sample,Data Summary,1648,1523,1448,1713,1229,1095,1184,1112,892,1055,1140,921,1051,871,960,2735/6562 proteins quantified across all timepoints (42%),696 proteins quantified in every experiment,1513 proteins quantified in at least one timepoint,34,400 peptides quantified on average per timepoint,1 million mass spectra collected,Features: 2720,Pep3D: Xiao-jun Li et al. submitted,CIDs: 1633,Features: 2720,Features: 2720,CIDs: 1633,IDs: 363,ID/CID: 22%,ID/feature: 13%,Possible Solutions,Better separation technology,Selective peptide isolation,Smart precursor ion selection,Tryptic yeast digest separated by FFE-IEX or SAX,30 fractions collected and analyzed by capLC-MS/MS,Overlap: same peptide identified in adjacent fractions,92%,68%,Possible Solutions,Better separation technology,Selective peptide isolation,Zhang H, et al.,Curr. Op. Chem . Biol,. (2004) 8: 66-75,Aebersold R,Nature (2003),422(6928):115-6,.,Smart precursor ion selection,Griffin T et al.,Anal Chem,.( 2003) 75:867-74.,Griffin et al.,J Am Soc Mass Spectrom,. (2001) 12:1238-46,.,Only a (small) subset of peptides present is identified,Current separation strategies do not have sufficient resolving power,MS/MS of every peptide in every experiment is a bottleneck of current MS based proteomics,LC-ESI MS/MS wastes a high fraction of MS/MS cycles sequencing precursor ions that do not lead to a positive identification,Most positive identifications are not informative in profiling experiments,Smart precursor ion selection is required,Summary: Efficiency Problem,Current Limitations,(and Potential Solutions),The efficiency problem,The validation problem,The biological inference problem,MS/MS spectra,A,B,C,D,A,B,C,Protein Identification by MS/MS,protein sample,MS/MS spectra,peptide mixture,peptide identifications,protein identifications,MS/MS spectra,A,B,C,D,A,B,C,Protein Identification by MS/MS,protein sample,MS/MS spectra,peptide mixture,peptide identifications,protein identifications,Protein level,Peptide level,MS/MS spectrum,level,Database search,Tools:,-Sequest,-Mascott,SpectrumMill,Etc.,OUTPUT FROM SEARCH ALGORITHM,sort by search score,sort by search score,threshold,incorrect,“,correct,”,SEQUEST:,Xcorr,C,n,MASCOT:,Score ,47,Threshold Model,Difficulty Interpreting Protein Identifications based on MS/MS,Different search score thresholds used to filter data,Unknown and variable false positive error rates,No reliable measures of confidence,Spectrum,Peptide,Score, ,Statistical Model,entire dataset,:,MS/MS,spectrum,best,match,database search,score,Spectrum,Peptide,Score,Spectrum 1 LGEYGH 4.5,Spectrum 2 FQSEEQ 3.4,Spectrum 3 FLYQE 1.3, ,Spectrum N EIQKKF 2.2,Statistical Model,entire dataset,:,EM mixture model algorithm learns the most likely distributions among correct and incorrect peptide assignments given the observed data,incorrect,correct,incorrect,-,correct,-,probability,unsupervised learning,Threshold Model:,Bad Discrimination and Inconsistency,Sensitivity,:,fraction of all,correct results,passing filter,Error Rate,:,fraction of all,results passing,filter that are,incorrect,Ideal,Spot,SEQUEST thresholds,(from literature),test data: A. Keller,et al,. OMICS 6(2), 207 (2002),Discriminating Power of Peptide Prophet,Sensitivity,:,fraction of all,correct results,passing filter,Error Rate,:,fraction of all,results passing,filter that are,incorrect,Ideal,Spot,SEQUEST thresholds,(from literature),probability,model,Improved discrimination,:,more identifications (for the same error rate),Keller at al. Anal. Chem. 2003,sp|P02754|LACB_BOVIN,BETA-LACTOGLOBULIN PRECURSOR (BETA-LG) (ALLERGEN BOS D 5) - Bos taurus (Bovine).,MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEEL,KPTPEGDLEILLQK,WENGECAQKKIIAEKTKIPAVFKIDALNENKLVLDTDYKKYLLFCMENSAEPEQSLACQCLVR,TPEVDDEALEKFDK,ALKALPMHIR,LSFNPTQLEEQCHI,TPEVDDEALEK,:,p = 0.96,TPEVDDEALEKFDK,:,p = 0.96,KPTPEGDLEILLQK,:,p = 0.83,LSFNPTQLEEQCHI,:,p = 0.65,LSFNPTQLEEQCHI,:,p = 0.76,sp|P02754|LACB_BOVIN,Probability = ?,ProteinProphet,TM,software combines probabilities of peptides assigned to MS/MS spectra to compute,accurate,probabilities that corresponding proteins are present,Protein Identification,Nesvizhskii et al Anal Chem. (2003)75:4646-58.,Issues for Protein Identification,Many peptides are present in more than a single database protein entry,ProteinProphet apportions such peptides among all corresponding proteins to derive simplest list of proteins that explain observed peptides,Peptides corresponding to single-hit proteins are less likely to be correct than those corresponding to multi-hit proteins,ProteinProphet learns by how much peptide probabilities should be adjusted to reflect this protein grouping information,Prot A,Peptide 1,Peptide 2,Prot B,Peptide 3,Peptide 4,Peptide 5,Prot,Prot,Prot,Prot,in the sample,(enriched for multi-hit proteins),not in the,sample,(enriched for single hits),Prot,Peptide 6,Peptide 7,Peptide 8,Peptide 9,Peptide10,+,+,+,+,+,5,correct,(+),Amplification of False Positive Error Rate from Peptide to Protein Level,Peptide Level: 50% False Positives,Protein Level: 71% False Positives,Data Filter # ids # non-single hits # single-hits,Publ. Threshold model#122573591898,Publ. Threshold model #227424412301,ProteinProphet, p,0.5713 511202,(,predicted error rate: 7%,),Serum Protein Identifications from Large-scale (375 run) Experiment,Reference:,H. Zhang et al., in prep,Consistency of Manual Validation of SEQUEST Search Results,Correct Validation,Incorrect Validation,Validation Withheld,Manual Authenticators,Search,Results,mzXML,Data Analysis Pipeline,Suitable,input,Peptide,assignment,Validation,Protein,assignment,Quantitation,Interpretation,Tasks for a proteomic analysis pipeline,COMET,ProbID,Peptide Prophet,Protein Prophet,ASAPRatio,SBEAMS,Cytoscape,Processing of data collected from different platforms, samples, experiments, operators requires transparent methods to score data,Publication and relational database analysis require consistently scored data,Tools assigning probability based scores are essential,Openly accessible, transparent (OS) tools bring in new talent and lead to community improved tools,Data Analysis Summary,:,Nesvizhskii and Aebersold (2004) Drug Discov Today. 9:173-81,Current Limitations,(and Potential Solutions),The efficiency problem,The validation problem,The biological inference problem,HPLC-MS/MS,IFN-treated,Mock-treated,ICAT label,C12,C12,/,C13,C13,Wei Yan et al,Name,Cellular pathway,Probability,ASAPRatio Mean,ASAPRatio Std.,671 748,270 330,523 590,P,0.9 P,0.4,1464 1668,P3,S100,P100,Unique ID,Sum,1113 1272,54 IFN-induced proteins (2-fold),15 previously reported,39 novel,23 IFN-repressed proteins (0.5-fold),Lots of data -what does it mean?,Katze et al (2002) 2: 675,Interferon (IFN) Pathway,2.215 0.079,PKR,2,5-OAS,Mx,ADAR,IRFs,MHC,3.963 0.659,IFN / Mock,2.460 0.076,2.359 0.149,1.398 0.118,Not identified,2.768 0.583,-2-microglobulin (MHC I),IFI-30 (MHC II),2.219 0.183,GO level,3,4,5,6,7,8,9,10,11,12,Physiological process,Response to stress,Response to external stimulus,Pathogenesis,Metabolism,Death,Cell growth and/or maintenance,Cellular defense response,Metabolism,Cell growth and/or maintenance,Defense response,Immune response,Cell death,Transport,Cell organization,Cell growth,Cytoplasm organization,Nuclear organization,Fatty acid metabolism,Amino acid metabolism,Nitrogen metabolism,DNA metabolism,Catabolism,GO Analysis of Interferon regulated proteins,Islands of intense knowledge in ocean of unknown,Hormone,responses,Cell,motility,Energy,metabolism,Transcription,Charting the path between landmarks,Hormone,responses,Cell,mobility,Energy,metabolism,Transcription,Unassigned observations,Walking down the interaction map,A,D,B,C,F,E,I,H,G,First round of TAP-tagging:Identification of IGBP1 and TIP41 interactors,IGBP1,TIP41,PPP6R2A*,PPP6R1*,PPP4R2*,PPP2CB,PPP4C,PPP6C,PPP2CA,CCT4,CCT5,CCT6A,CCT8,CCT7,CCT3,TCP1,CCT2,CCT complex,Catalytic subunits,PP2A-type,phosphatases,Uncharacterized,proteins,Anne-Claude Gingras,Human phosphatase-interaction network:Segregation into functional modules,Centrosome; Meiosis,Exit from mitosis; Actin cytoskeleton,PP2,B,PP2,C,PP2A a,PP4,C,PP6,C,G1, S transition,Acknowledgements,Separation strategies,Hookeun Lee,Eugene Yi,Mingliang Yi,Abundance dependent MS/MS,Tim Griffin,Chris Lock (Sciex),Software development and statistical models,Eric DeutschXiao-Jun Li,Jimmy EngAlex Nesvizhskii,Andy KellerBenno Schwikowski,Patrick PedrioliNing Zhang,Inference of biological function,Wei Yan,Anne-Claude Gingras,Cytoscape project ( cytoscape.org),Funding:,NIH (NCI, NCRR, NIDA, NHBLI), Merck, ABI,

展开阅读全文

Data Collection and Analysis for High Throughput Quantitative Proteomics Current Status and Challenges

最新文档