蛋白质分析和蛋白质组学课件

资源描述

,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,蛋白质分析和蛋白质组学,protein,RNA,DNA,1,蛋白质分析和蛋白质组学proteinRNADNA1,protein,4 Protein function,3 Protein localization,Gene ontology (GO):,-cellular component,-biological process,-molecular function,1 Molecular biology,2 Protein families,2,protein4 Protein function3,视角,3,和,4,的介绍：,Gene Ontology (GO) Consortium,3,视角3和4的介绍： Gene Ontology (GO) C,Gene Ontology,成立的背景,Year,1982,2005,Number of records,602,44, 202,133,GenBank,EMBL,DDBJ,PubMed:,over 15 million citations,4,Gene Ontology 成立的背景Year1982200,Whats in a name?,Glucose synthesis,Glucose biosynthesis,Glucose formation,Glucose anabolism,Gluconeogenesis,All refer to the process of making glucose from simpler components,5,Whats in a name?Glucose synth,Whats in a name?,The same,name,can be used to describe different,concepts,A,concept,can be described using different,names,Comparison is difficult in particular across species or across databases,6,Whats in a name?The same name,本体,(,ontology,),计算机科学对自然世界认知的形式化的表示,既是可被计算机表示,解释和利用的知识的形式化的研究,即本体。本体是结构化的领域知识,并可以被计算机解释和利用。,实现对生命世界中这些概念理解上的共享,包括从不同的视角,不同的术语分类,不同的主体,(,人和机器,),共享概念,-,概念化的规范,Gene Ontology(GO),协会致力于这样一项工程：编辑一组动态的而又可控的词汇来描述基因和基因产物,(,主要是蛋白质,),不同方面的性质。,7,本体(ontology)计算机科学对自然世界认知的形式化的表,Ontologies can be represented as graphs, where the,nodes,are connected by,edges,Nodes =,concepts,in the ontology,Edges =,relationships,between the concepts,node,node,node,edge,Ontology Structure,8,Ontologies can be represented,所有这些蛋白质能做什么？,“,功能,”太有局限性。,生物学家想知道：,每个蛋白质能做什么，属于哪条细胞回路或者为什么细胞需要这个功能，以及在什么地方发生了这样的过程。,9,所有这些蛋白质能做什么？“功能”太有局限性。9,Gene Ontology,的发起,芽殖酵母基因组数据库,(SGD),果蝇基因组数据库,(drosophila genome database,，简称,FlyBase),小鼠基因组信息数据库；,(mouse genome information database,，简称,MGD,GXD),GO,数据库不是以其自身为中心而是依靠外部数据库，这些外部数据库中收录的基因及其产物都将用,GO,定义的词汇进行注释。因此,GO,是与时俱进与相互合作的代表，它致力于统一基因及其产物注释的方式。,You can visit GO at http:/www.geneontology.org.,10,Gene Ontology的发起芽殖酵母基因组数据库(SGD,GO(Gene Ontology) structure,GO isnt just a flat list of biological terms,terms are related within a hierarchy,11,GO(Gene Ontology) structureGO,Hierarchical structure,层次性：,is a,：上一个概念包括下一个概念，,下一个概念是上一个概念的实例。,part of,：下一个概念是上一个概念的一部分,树,松树,叶子,Part of,Is a,12,Hierarchical structure层次性：树松树,True Path Rule,True Path Rule,：,如果下一代的,术语可以用于描述此基因产物，,其上一代术语也可以适用。,已糖代谢和单糖合成,己糖合成活性,13,True Path RuleTrue Path Rule：如,DAG,有向无环（,DAG,）,Simple hierarchies,(Trees),Directed Acyclic Graphs,One or more parents,Single parent,14,DAG有向无环（DAG）Simple hierarchies,How does GO work?,What does the gene product do?,Where and when does it act?,Why does it perform these activities?,What information might we want to capture about a gene product?,15,How does GO work?What does the,GO: Three ontologies,Where does it act?,What processes is it involved in?,What does it do?,Molecular Function,Cellular Component,Biological Process,gene product,16,GO: Three ontologiesWhere does,Molecular Function,分子功能描述在分子生物学上的活性，如催化活性或结合活性。,Sets of functions make up a biological process.,insulin binding,insulin receptor activity,17,Molecular Function分子功能描述在分子生物学,Cellular Component,where a gene product acts,（细胞中的位置指基因产物位于何种细胞器或基因产物组中（如糙面内质网，核或核糖体，蛋白酶体等））,18,Cellular Componentwhere a gene,Biological Process,生物学途径是由分子功能有序地组成的，具有多个步骤的一个过程。（细胞生长和维持、信号传导、嘧啶代谢或,配糖基的运输）。,cell division,gluconeogenesis,19,Biological Process生物学途径是由分子功能有,Biological Process,20,Biological Process20,lipocalin,21,lipocalin21,以树状图形式显示的,GO,词汇之间的关系,22,以树状图形式显示的GO词汇之间的关系 22,Perspective 3:,Protein localization,23,Perspective 3: 23,protein,Perspective 3: Protein localization,24,proteinPerspective 3: Protein,Protein localization,Proteins may be localized to intracellular compartments,cytosol, the plasma membrane, or they may be secreted.,Many proteins shuttle between multiple compartments.,A variety of algorithms predict localization, but this,is essentially a cell biological question.,很多蛋白质不能被单一地确定存在于细胞一个固定位置上。例如膜联蛋白和小,G,蛋白家族就转移于胞质和膜之间,(,有时在胞质内，有时在膜上,),。这种转移运动取决于是否有特定的细胞信号存在，例如钙离子。,25,Protein localizationProteins m,http:/psort.nibb.ac.jp,26,http:/psort.nibb.ac.jp26,http:/www.ch.embnet.org/software/TMPRED.form.html,27,http:/www.ch.embnet.org/softw,28,28,Localization of 2,900 yeast proteins,Michael Snyder and colleagues incorporated epitope,tags into thousands of,S. cerevisiae,cDNAs,and systematically localized proteins (Kumar et al., 2002).,See,http:/ygac.med.yale.edu,for a database including,2,900 fluorescence micrographs.,29,Localization of 2,900 yeast pr,Perspective 4:,Protein function,Function refers to the role of a protein in the cell.,We can consider protein function from a variety,of perspectives.,30,Perspective 4: Protein functio,1. Biochemical function,(molecular function),RBP binds retinol,could be a carrier,例子：,酶,结构蛋白,转运蛋白,细胞中不存在没有任何功能的蛋白。,31,1. Biochemical functionRBP bin,2. Functional assignment,based on homology,RBP,could be,a carrier,too,Other,carrier,proteins,增味剂结合蛋白是,lipocalins,的一个成员，也被认为是一个载体蛋白,32,2. Functional assignmentRBPOth,3. Function,based on structure,RBP forms a calyx,X,射线晶体衍射显示,RBP,形成一个类似茶杯的结构，有一圈疏水氨基酸组成，充当一个配体结合位点,33,3. FunctionRBP forms a calyxX射,4. Function based on,ligand binding specificity,RBP binds vitamin A,34,4. Function based onRBP binds,5. Function based on,cellular process,DNA,RNA,RBP is abundant,soluble, secreted,35,5. Function based onDNARNARBP,6. Function based,on biological process,RBP is essential for vision,36,6. Function basedRBP is essent,7. Function based on “proteomics”,or high throughput “functional genomics”,High throughput analyses show.,RBP levels elevated in renal failure,RBP levels decreased in liver disease,37,7. Function based on “proteomi,Functional assignment of enzymes:the EC (Enzyme Commission) system,EC,号,类别描述,酶的数目,子类的例子,1.-.-.-,氧化还原酶,1003,1.1.-.-,作用于,CH-OH,基团,1.2.-.-,作用于醛类或氧络集团,2.-.-.-,转移酶,1076,2.1.-.-,转移,碳基团,3.-.-.-,水解酶,1125,4.-.-.-,裂解酶,356,5.-.-.-,异构酶,156,6.-.-.-,连接酶,126,38,Functional assignment of enzym,Functional assignment of proteins:,Clusters of Orthologous Groups (COGs),39,Functional assignment of prote,Proteomics: High throughput protein analysis,Proteomics is the study of the entire collection,of proteins encoded by a genome,“Proteomics” refers to all the proteins in a cell,and/or all the proteins in an organism,Large-scale protein analysis,2D protein gels,Yeast two-hybrid,Rosetta Stone approach,40,Proteomics: High throughput pr,Classical biochemical approach,Identify an activity,Develop a bioassay,Perform a biochemical purification,Strategies: size, charge, hydrophobicity,Purify protein to homogeneity,Clone cDNA, express recombinant protein,Grow crystals, solve structure,41,Classical biochemical approach,42,42,Two-dimensional protein gels,First dimension: isoelectric focusing,Second dimension: SDS-PAGE,43,Two-dimensional protein gelsFi,44,44,45,45,46,46,47,47,48,48,Evaluation of 2D gels (IEF/SDS-PAGE),Advantages:,Visualize hundreds to thousands of proteins,Improved identification of protein spots,Disadvantages:,Limited number of samples can be processed,Mostly abundant proteins visualized,Technically difficult,49,Evaluation of 2D gels (IEF/SDS,Affinity chromatography/mass spec,Bait protein,GST,50,Affinity chromatography/mass s,Affinity chromatography/mass spec,Bait protein,GST,Add yeast extract,Protein complexes bind,Most proteins do not bind,51,Affinity chromatography/mass s,Affinity chromatography/mass spec,Bait protein,GST,Elute,Run gel,MALDI-TOF,Identify complexes,52,Affinity chromatography/mass s,Affinity chromatography/mass spec,Data on complexes deposited in databases,http:/www.bind.ca,53,Affinity chromatography/mass s,54,54,55,55,The yeast two-hybrid system,Reporter gene,Bait protein,DNA Binding,Prey protein,DNA activation,Isolate and sequence the cDNA,of the binding partner you have found,We will learn about it later when we study protein interaction networks,56,The yeast two-hybrid systemRep,red = cellular role & subcellular localization of interacting proteins are identical,;,blue = localiations are identical,;,green = cellular roles are identical,57,red = cellular role & subcellu,The Rosetta Stone approach,Marcotte et al. (1999) and other groups hypothesized,that some pairs of interacting proteins are encoded by,two genes in many genomes, but occasionally they,are fused into a single gene.,By scanning many genomes for examples of “fused,genes,” several thousand protein-protein predictions,have been made.,58,The Rosetta Stone approachMarc,Yeast topoisomerase II,E. coli,gyrase B,E. coli,gyrase A,The Rosetta Stone approach,59,Yeast topoisomerase IIE. coliE,罗赛塔石碑,60,罗赛塔石碑60,Gene Fusion (Rosetta stone method),G,1,:,G,2,:,E.coli,Yeast,trpA trpB,Tryptophan synthase subunits A and B, fused in yeast.,It is based on the observation that some interacting proteins/domains,have homologs in other genomes that are fused into one protein chain, a so-called,Rosetta Stone protein,.,61,Gene Fusion (Rosetta stone met,How many “gene fusions”?,3 genomes, 88 gene fusions,179genomes ? fusions,Marcotte,E.coli,:6809,Yeast,:45502,62,How many “gene fusions”?3 geno,protein,1 Molecular biology,4 Protein function,2 Protein families,3 Protein localization,Gene ontology (GO):,-cellular component,-biological process,-molecular function,63,protein1 Molecular biology4,Perspective 2: Protein family,domains and motifs,为什么关注蛋白质家族？,64,Perspective 2: Protein famil,基因重复,65,基因重复65,蛋白质同源序列和家族,在目前所有已知的数据库中均没有发现同源序列的蛋白质。,它的其他性质,(,如跨膜区结构域、磷酸化位点、预测出的二级结构等,),也会给我们了解该蛋白质的结构或功能提供一些线索。,有直系同源序列或旁系同源序列的蛋白质。,这种蛋白质至少能找到一条同源序列，且两条序列存在具有显著相似性或显著特征的区域。这些有显著,序列相似性或显著结构特征的区域,有很多名称，如,签名,(signature),、,结构域,(,domain),、,模块,(module),、,模块元件,(modular element),、,折叠子,(fold),、,模体,(motif),、,模式,(pattern),或,重复,(repeat),。,66,蛋白质同源序列和家族在目前所有已知的数据库中均没有发现同源序,Definitions,Signature:,a protein category such as a domain or motif,Domain:,a region of a protein that can adopt a 3D structure,a fold,a family is a group of proteins that share a domain,examples: zinc finger domain,immunoglobulin domain,Motif (or fingerprint):,a short, conserved region of a protein,typically 10 to 20 contiguous amino acid residues,67,DefinitionsSignature: 67,签名,（,signature,）,签名,(signatures),的概念很宽广，它确定一个蛋白质分类，可能指结构域,(domain),、家族,(family),或模体,(motif),。仅考虑单独的一个蛋白质时，我们仅能得到很少一部分关于其结构和功能的信息；但是将它与相关序列比对找到保守部分后，从保守序列中可以推测出很多信息。签名主要可以分为两类，每一类签名都可以用各自的方法确定。,结构域,(domain),是蛋白质中能折叠成,特定三维结构,的一段区域。结构域也能被称为模块。一组拥有相同结构域的蛋白被称为一个,蛋白质家族,。,模体,(motif,，或称指纹，,fingerprint),是蛋白质序列中,较短,的保守区域。模体的长度一般是,1020,个氨基酸残基，尽管实际中的模体有可能更长或更短。一些简单而常见的模体在一组蛋白质中发现并不意味着这组蛋白质是同源的，例如形成跨膜区结构域或保守磷酸化位点的模体。而另一些情况中，小的模体则是一个蛋白质家族的标志（如,prosite,）。,68,签名（signature）签名(signatures)的概念,InterPro,对相关术语的定义,家族：,InterPro,定义一组进化上相关的共享一个或多个结构域的蛋白质为一个家族,结构域：,InterPro,数据库中的结构域是指一个独立的结构单元，他们可能单独存在也可能与其他结构域相连。结构域也是进化上相关的序列。,69,InterPro对相关术语的定义家族：InterPro定义,SMART,对相关术语的定义,结构域：保守的结构单元，包含独特的二级结构组合和疏水内核。具有相同功能的同源结构域往往具有序列上的相似性。,模体：序列模体是指短的保守的多肽段。含有相同模体的蛋白质并不一定是同源的。,70,SMART对相关术语的定义结构域：保守的结构单元，包含独特,结构域和,motif,血清蛋白（,581,氨基酸）：,3,个类似结构域，每个约,180,个氨基酸,胶原蛋白中存在着几十个有,GXY,三肽组成的重复片段,RNA,聚合酶最大亚基,C,端结构域中有,52,个重复的六肽片段：,T/SPTSPN/T.,PrP(,疯牛病,),：有四个连续对八肽：,PHGGG/SWGQ,许多细胞内信号转导的蛋白质含有,SH2,肽段（与磷酸化丝氨酸,/,苏氨酸结合的肽段）,71,结构域和motif血清蛋白（581氨基酸）：3个类似结构域，,Definition of a motif,A motif (or fingerprint) is a short, conserved region,of a protein. Its size is often 10 to 20 amino acids.,Simple motifs include transmembrane domains and,phosphorylation sites. These do not imply homology,when found in a group of proteins.,PROSITE (www.expasy.org/prosite) is a dictionary of,motifs. In PROSITE,a,pattern,is a qualitative motif description (a protein,either matches a pattern, or not). In contrast, a,profile,is a quantitative motif description. We will encounter,profiles in Pfam, ProDom, SMART, and other databases.,72,Definition of a motifA motif (,蛋白质,motif,EIQDVS,GTW,YAMTVDREFPEMNLESVTPMTLTTL.GGNLEAKVTM lipocalin 1,LSFTLEEEDIT,GTW,YAMVVDKDFPEDRRRKVSPVKVTALGGGNLEATFTF odorant-binding protein 2a,TKQDLELPKLA,GTW,HSMAMATNNISLMATLKAPLRVHITSEDNLEIVLHR progestagen-assoc. endo.,VQENFDVNKYL,GRW,YEIEKIPTTFENGRCIQANYSLMENGNQELRADGTV apolipoprotein D,VKENFDKARFS,GTW,YAMAKDPEGLFLQDNIVAEFSVDETGNWDVCADGTF retinol-binding protein,LQQNFQDNQFQ,GKW,YVVGLAGNAI.LREDKDPQKMYATIDKSYNVTSVLF neutrophil gelatinase-ass.,VQPNFQQDKFL,GRW,FSAGLASNSSWLREKKAALSMCKSVDGGLNLTSTFL prostaglandin D2 synthase,VQENFNISRIY,GKW,YNLAIGSTCPWMDRMTVSTLVLGEGEAEISMTSTRW alpha-1-microglobulin,PKANFDAQQFA,GTW,LLVAVGSACRFLQRAEATTLHVAPQGSTFRKLD. complement component 8,73,蛋白质motifEIQDVSGTWYAMTVDRE,例子,在,HIV-1 pol,蛋白的天冬氨酸蛋白酶,(aspartyl proteases),结构域中，天冬氨酸残基,(asp),对酶的催化活性至关重要。天冬氨酸蛋白酶模体由,12,个氨基酸残基构成,：,LIVMFGAC-LIVMTADN-LIVFSA-D-ST -G-STAV-STAPDENQ-x-LIVMFSTNC-x-LIVMFGTA,。,几乎所有的,lipocalins,中都可以找到一个短的模体,GxW,。,PROSITE,数据库定义的,lipocalins,的保守氨基酸模体是：,DENG-x- DENQGSTARK-x(0,2)-DENQARK-LIVFY-CP-,G-C-W,-FYWLRH-x-LIVMTA,。,74,例子在HIV-1 pol蛋白的天冬氨酸蛋白酶(asparty,模体,氨基酸残基修饰的模体,蛋白质细胞定位的模体,与活性有关的模体,其他模体,75,模体氨基酸残基修饰的模体75,氨基酸残基修饰的模体,糖蛋白中被,N,糖基化的天冬酰胺（,N,）一定是处于,N,PS/T,中的。,在一些与凝血过程相关的蛋白质中，被羟化的天冬氨酸或天冬酰胺处于,CX,D/N,X,4,XCXC,的模体中。,为芳香性氨基酸，,X,4,为任意氨基酸构成的四肽。,被磷酸化的丝氨酸和苏氨酸在不同蛋白质中处于不同的模体中。组蛋白中为,S,P#(#,为带正电的氨基酸,),。蛋白激酶,PKA,或,PKG,中的模体是,#X,S/T,。,76,氨基酸残基修饰的模体糖蛋白中被N糖基化的天冬酰胺（N）一定是,当,C,端的,4,个氨基酸序列为,KDEL,或,HDEL,时，蛋白质就被局限在细胞的内质网中,能进入细胞核的肽链都有特定的序列模体。,1. PKKKRKV or KRX,10,KKKK,；,2.,蛋白激酶中,KRX,21,RXKXKXK; 3. #RX,10,#XX,。,蛋白质细胞定位的模体,77,当C端的4个氨基酸序列为KDEL或HDEL时，蛋白质就被局限,Motif,与细胞定位,78,Motif与细胞定位78,与活性有关的模体,在许多蛋白水解酶中，与催化有关的活性中心由,D/E-H-S,组成。,在,ATP,和,GTP,结合蛋白质中存在着一种序列为,GXXXXGKT/S,的模体：,Rho,家族,G,DGAX,GKT,ATP,合成酶,G,GAGV,GKTV,肌球蛋白重链,G,ESGS,GKT,胸苷激酶,G,XXGX,GKTT,胸苷酸激酶,G,XPGX,GKGT,这个模体可以形成一个特定的结构，与核苷酸结合。,79,与活性有关的模体在许多蛋白水解酶中，与催化有关的活性中心由D,其他模体,含有半胱氨酸的模体。在一些蛋白质中存在着特定的序列模体，其中半胱氨酸的位置是相对固定的（锌指）。,未知功能的模体。如一些细胞因子受体的膜外侧接近膜处有,WKS,和,WSKWS,序列模体，但其功能还不清楚。,80,其他模体含有半胱氨酸的模体。在一些蛋白质中存在着特定的序列模,模体的意义,总结：模体,(,motif or fingerprint),是蛋白质序列中较短的保守区域，是按照一定的模式排列的氨基酸残基，长度,一般,在,10,20,残基之间。,一些简单而常见的模体在一组蛋白质中发现并不意味着这组蛋白质是同源的（,跨膜区结构域或磷酸化位点,）。,在另一些情况下，模体可以成为一个蛋白质家族的标志，反映了这个家族的亲缘关系。可以利用这个族徽寻找宗亲。（,载脂蛋白超家族,）,1986,年至今，国际生物化学学会主编的,Trends in Biochemical Science,一直有专栏刊登不同类型的的序列模体（也同时刊登结构域、模块等）。,81,模体的意义总结：模体( motif or fingerpri,结构域和模体：蛋白质的模块性质,82,结构域和模体：蛋白质的模块性质82,结构域的由来,从球状蛋白到晶体衍射实验。（溶菌酶）,免疫球蛋白的例子,蛋白质的折叠过程,20,世纪,60,70,年代，提出结构域（,domain,）的概念。从水解实验，可以看出结构域能组成一个结构单元。,结构域常由不同的外显子编码。,83,结构域的由来从球状蛋白到晶体衍射实验。（溶菌酶）83,Definition of a domain,According to InterPro at EBI (,http:/www.ebi.ac.uk/interpro,/):,A domain is an independent structural unit, found alone,or in conjunction with other domains or repeats.,Domains are evolutionarily related.,According to SMART (http:/smart.embl-heidelberg.de):,A domain is a conserved structural entity with distinctive,secondary structure content and a hydrophobic core.,Homologous domains with common functions usually,show sequence similarities.,84,Definition of a domainAccordin,总结,结构域,的概念：从最初的一级结构中较长的重复片段，上升为有特征的立体结构，而且他们有一定,生物功能,，并且对应着基因中的某些外显子，为它们编码、形成肽链后，还能自行折叠成稳定的结构。总之，结构域可看作是一个,“,entity,”,。,一般来说,，如果两个蛋白质拥有一个相同的结构域，那么这两个蛋白质有相关的功能。,序列模体,是一个序列上经概括后,“,求同存异,”,的,“,框架,”,，是在一段肽段中,关键位置上,氨基酸残基的组合模式。二者的区别在于,结构域,有,“,结构,”,的含义。,85,总结结构域的概念：从最初的一级结构中较长的重复片段，上升为有,人类中,15,个最常见的结构域,86,人类中15个最常见的结构域 86,蛋白质共享一个结构域,Extending along the length of a protein,Occupying a subset of a protein sequence,Occurring one or more times,lipocalin,免疫球蛋白结构域,纤连蛋白重复区,与甲基化的,DNA,结合的转录因子家族,87,蛋白质共享一个结构域Extending along the,Example of a protein with domains:,Methyl CpG binding protein 2 (MeCP2),MBD,TRD,The protein includes a methylated DNA binding domain,(MBD) and a transcriptional repression domain (TRD).,MeCP2 is a transcriptional repressor.,Mutations in the gene encoding MeCP2 cause Rett,Syndrome, a neurological disorder affecting girls,primarily.,88,Example of a protein with doma,Result of an MeCP2 blastp search:,A methyl-binding domain shared by several proteins,89,Result of an MeCP2 blastp sear,这些蛋白质的大小差别很大，并且结合甲基化,DNA,的结构域也出现在蛋白质的不同位置上。从,BLAST,的匹配结果看，这些蛋白质除了甲基化,DNA,结构域以外没有其他序列上有显著相似性的区域了,90,这些蛋白质的大小差别很大，并且结合甲基化DNA的结构域也出现,多个拷贝的结构域,Occurring one or more times,很多结构域在蛋白质中有多个拷贝，两个最常见的例子就是免疫球蛋白结构域和纤连蛋白重复区。这些结构域在蛋白质的胞外区极其常见,91,多个拷贝的结构域Occurring one or more,Are proteins that share only a domain homologous?,蛋白质家族是怎么定义的呢？是否一组仅共有一个结构域的同源蛋白质就可以被称为一个蛋白质家族呢？上面的例子中，,5,个蛋白中的,MBD,结构域显然是同源的,(,来自于同一祖先,),；虽然它们除了,MBD,结构域外没有显著相似的区域，但是这组蛋白仍旧构成一个蛋白质家族。,92,Are proteins that share only a,蛋白质,家族,蛋白质家族是根据蛋白质的同源性来定义的；同一个家族内的蛋白质就是一组进化上相关的蛋白质，这些蛋白质共享一个或多个结构域。,什么逻辑,？,1.,在一级结构比较的基础上，发现很多蛋白质存在同源性；,2.,（相似性往往是区域性的）一些同源性可以简单的用结构域加以表征；,3.,为了（计算机分类）方便，由结构域将相关的蛋白质归属为一个蛋白质家族。,在蛋白质家族分类的过程中，有时向下细分为,亚家族,（,subfamily,），有时又向上归并为,超家族,（,superfamily,）,93,蛋白质家族蛋白质家族是根据蛋白质的同源性来定义的；同一个家族,Challenges for “,家族分类,”,并系同源蛋白：,类视紫红质受体,超家族：,视觉、听觉、嗅觉、激素、神经传导的受体,脊椎动物进化过程早期分化而得到的不同视觉受体对不同的波长敏感,人类视觉系统受体包括对红光和绿光等,长波敏感,的各种蛋白质分子，它们之间区别不大，其序列相似性程度为,95%,左右。这些视觉系统长波受体分子与蓝光等,短波,受体分子以及视紫红质等,非色彩,受体分子却相差很远，序列平均相似性为,43%,。可见，由种类繁多的并系同源蛋白和直系同源蛋白所产生的序列复杂性，对蛋白质家族分类研究是一个巨大的挑战。,94,Challenges for “家族分类”并系同源蛋白：94,两个“,家族,”,牛胰核糖核酸酶家族。,丝氨酸蛋白抑制剂家族。,95,两个“家族”牛胰核糖核酸酶家族。95,Example of a multidomain protein: HIV-1 pol,1003 amino acids long,cleaved into three proteins with distinct activities:,- aspartyl protease(,天冬氨酸蛋白酶,),- reverse transcriptase(,反转录酶,),- integrase(,整合酶,),We will explore HIV-1 pol and other proteins at the,Expert Protein Analysis System (ExPASy) server.,Visit www.expasy.org/,HIV-1,的,pol,（,polymerase,，聚合酶）,96,Example of a multidomain prote,97,97,98,98,SwissProt entry for HIV-1 pol links to many databases,99,SwissProt entry for HIV-1 pol,ProDom entry for HIV-1 pol shows many related proteins,100,ProDom entry for HIV-1 pol sho,Proteins can have both domains and patterns (motifs),Domain,(aspartyl,protease),Domain,(reverse,transcriptase),Pattern,(several,residues),Pattern,(several,residues),蛋白质可能含有相对较大的结构域和通常仅由几个氨基酸残基构成的模式,(,模体,),。尽管模式或模体不构成一个已知的三维构象，但它包含的氨基酸残基却可能是一个蛋白质家族的特征序列,101,Proteins can have both domains,思考一,同一个,结构域,可能出现在一个蛋白质的氨基末端，也可能出现在另一个蛋白质的羧基末端，是吗？,是,不是,102,思考一同一个结构域可能出现在一个蛋白质的氨基末端，也可能出现,思考二,一般来说，蛋白质,结构域,（,domain,）和,模体,（,motif,，也称模式或指纹）的大小关系是：,A,、它们一样长；,B,、,模体,比,结构域,长；,C,、,结构域,比,模体,长；,D,、只能对于特定的蛋白质才能进行比较,103,思考二一般来说，蛋白质结构域（domain）和模体（moti,蛋白质与结构域、模体,句子,短语,单词,氨基酸,折叠方式,/,模体,结构域,蛋白质,字母,单词,短语,句子,104,蛋白质与结构域、模体句子短语单词氨基酸折叠方式/模体结构,氨基酸序列,ST-X-RK,是蛋白激酶,C,底物磷酸化的保守为点附近的序列，这段氨基酸序列是：,A,、一个,模体,，可以确定一组同源蛋白；,B,、一个,模体,，但不足以确定一组同源蛋白；,C,、一个,结构域,，可以确定一组同源蛋白；,D,、一个,结构域,，但不足以确定一组同源蛋白,思考三,105,氨基酸序列ST-X-RK是蛋白激酶C底物磷酸化的保守,趋异进化和趋同进化,序列分析的基本出发点,：通过数据库搜索，找出若干相同残基的,功能位点,(motif),，由某个初看起来完全不同的蛋白质确定未知蛋白的功能。,趋异进化和趋同进化,溶菌酶和,-,乳清蛋白：,趋异进化的例子,-,折叠桶：趋同进化的例子。蛋白激酶,C(PKC),保守磷酸化位点的信息：,ST-x-RK,(S,或,T,是磷酸化位点，,x,表示任意氨基酸残基,)(PROSITE document,的编号是,PDOC00005),。这个简单的模体在蛋白质中出现过的次数达到千余次。,106,趋异进化和趋同进化序列分析的基本出发点：通过数据库搜索，找出,蛋白质多模块性,-,简单构件的整合和再利用,单个蛋白含有多个模块,同一模块在不同功能蛋白质中出现,同一模块在不同蛋白质中，执行不同功能,107,蛋白质多模块性-简单构件的整合和再利用107,补丁制作的百衲衣,通过积木的方式把不同的基本模块组合起来，形成蛋白质的不同功能,gene shuffling,108,补丁制作的百衲衣通过积木的方式把不同的基本模块组合起来，形成,蛋白质模块性与进化,109,蛋白质模块性与进化109,opossums,are exploited in different Goldberg machines, where they perform different functions here, we could not predict a,opossum,sitting in that spot, even with total knowledge of the rest of the machine,Similarity searches are just like this,identifying the presence of a module tells little of the function of the complete system,knowing most components of a mosaic, we cant predict a missing one,modules (,opossums,) in different proteins dont always perform exactly the same function,110,opossums are exploited in dif,由构件推测整体的复杂性和不确定性,111,由构件推测整体的复杂性和不确定性111,基于蛋白质家族分类和签名，利用决策树（,C4.5,）预测蛋白质功能,112,基于蛋白质家族分类和签名，利用决策树（C4.5）预测蛋白质功,结构域和蛋白互作,113,结构域和蛋白互作113,

展开阅读全文

蛋白质分析和蛋白质组学课件

最新文档