MicroRNA ppt Collection

上传人:沙** 文档编号:243072047 上传时间:2024-09-15 格式:PPT 页数:53 大小:1.81MB
返回 下载 相关 举报
MicroRNA ppt Collection_第1页
第1页 / 共53页
MicroRNA ppt Collection_第2页
第2页 / 共53页
MicroRNA ppt Collection_第3页
第3页 / 共53页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,MicroRNA,The Computational Challenge,Bioinformatics Seminar, March 9, 2005,By Yaron Levy,Tree of RNA Types,miRNA,Biological Process,Micro RNA Computational Approach,Problem 1: Finding putative,microRNA,from a sequence,Horesh,et al, using suffix trees data structure,Problem 2: Computing secondary structure of a given sequence,Zuker,&,Steigler, minimum free energy, using dynamic programming,Problem 3:,miRNA,predicting algorithms,Lim et al,MiRscan,Problem 4: Predicting,miRNA,target genes,Lewis et al,TargetScan,Problem 1,Find these,Problem 1: Finding putative,microRNA,from a sequence,A nave idea: slide a “window” of size L over the sequence of size N, looking for stems of size S.,Computationally O(NL+NS) too much,A better approach using a suffix tree.,S,= M A L A Y A L A M $,1 2 3 4 5 6 7 8 9 10,$,YALAM$,M,$,ALAYALAM$,$M,YALAM$,$M,YALAM$,$M,YALAM$,A,AL,LA,6,2,8,4,7,3,1,9,5,10,What is a suffix tree?,Suffix tree properties,For a string,S,of length,n, there are,n,leaves and at most,n,internal nodes.,therefore requires only linear space,Each leaf represents a unique suffix.,Concatenation of edge labels from root to a leaf spells out the suffix.,Each,internal node,represents a distinct common,prefix,to at least two suffixes.,Finding a (short),Pattern,in a (long),String,Build a suffix tree of the string.,Starting from the root, traverse a path matching characters of the pattern.,If stuck, pattern not present in string.,Otherwise, each leaf below gives a position of the pattern in the string.,Find “ALA”,$,YALAM$,M,$,ALAYALAM$,M$,YALAM$,M$,YALAM$,M$,YALAM$,A,AL,LA,6,2,8,4,7,3,1,9,5,10,Two matches - at 6 and 2,Finding a Pattern in a String,Generalized Suffix Tree,$,O,ND,W,I,$OG,D,$OGI,OW$,$OG,ND,$OGI,OW$,$OGI,OW$,$W,$,INDOW$,$,(2, 3),(1, 4),(2, 5),(2, 4),(2, 1),(1, 2),(2, 2),(1, 3),(1, 5),(2, 6),(1, 6),(1, 1),(1, 7),(2, 7),WINDOW$ INDIGO$,1234567 1234567,Horesh,et al using a generalized suffix tree for finding putative,microRNAs,Assumptions:,At least a triple repeat is necessary:,2 for the stems of the hairpin close to each other in the sequence, and as inverted repeat of each other,The rest are target genes can be anywhere,The repeats must be fully matched no mismatches are allowed,This is more of a constraint,Horesh,et al the algorithm,Construct a generalized suffix tree of the original sequence and the inverted repeat sequence.,Preprocess the suffix tree for calculating:,Length of suffixes,Number of repeats,Index of suffix in sequence,With these attributes for each node, along with the indices of the suffixes in the sequence, it is possible to find the requested triple (or more) repeats.,Computationally efficient O(N),a,banana,na,na,na,na,1. Build a suffix tree,2. Scan the tree in a,PreOrder,traversal,(all parents are visited before their sons),The length of a prefix a node represent is:,node.len,=,father.len,+,node.Length,of the sequence fragment it carries,(root is 0),0,1,3,2,2,4,1,3,5,6,a,banana,na,na,na,na,3. Scan the tree in a,PostOrder,traversal,(all sons are visited before their parents),The number of repeats of a prefix a node represent is:,node.repeats,= Sum of sons repeats (leaf is 1),6,3,2,2,1,1,1,1,1,1,4. Scan the tree again,For every node that represents a prefix longer than SIZE (22 for,example), and has two repeats or more;,Print its length and repeats and print the indexes of its leaves.,Now every node carries the,length,of the prefix,It represents and the number of leaves below it.,(the,number of repeats,it is their prefix).,All sections are done in linear time !,a,na,na,1,3,1,3,5,Problem 1 conclusions,The problem is not trivial!,Suffix trees are an elegant solution, providing:,No mismatches are allowed (not really biologically realistic),Enough memory to store the large data structure,Problem 2,How do these fold?,Problem 2: Computing secondary structure of a given sequence,Approaches to RNA secondary structure prediction:,comparative sequence analysis,prediction from base sequence,find,minimum free energy (MFE),structure,Free energy model,free energy of structure (at fixed temperature, ionic concentration) = sum of loop energies,standard model,uses experimentally determined thermodynamic parameters where available; extrapolations for long loops,Free energy model,free energy of structure (at fixed temperature, ionic concentration) = sum of loop energies,standard model,uses experimentally determined thermodynamic parameters where available; extrapolations for long loops,On the MFE approach,MFE approach ignores folding pathway, metal ions, nonstandard bonds,“some species can remain kinetically trapped in,nonequilibrium,states we expect that most RNAs exist naturally in their thermodynamically most stable configurations” ,Tinoco,and,Bustamante, J. Mol. Biol. 1999.,Why is MFE secondary structure prediction hard?,MFE structure can be found by calculating free energy of all possible structures,but, number of potential structures grows exponentially with the number, n, of bases,structures can be arbitrarily complex,success for restricted classes of structures,Predicting MFE pseudoknot free structures,Dynamic programming avoids explicit enumeration of all pseudoknot free structures (,Zuker,&,Stiegler,1981,),Suboptimal folds, probabilities of base pairings can also be calculated,software:,mfold, Vienna package,Dynamic programming (,Zuker,&,Steigler,),Based on the “more is less” principle: by calculating more than you need, less work is needed overall,Construct MFE structure for whole strand from MFE structures for,substrands,Running time is O(n,3,),RNA folding with dynamic programming,Assume a function,W(i,j,) which is the MFE for the sequence starting at i and ending at j (ij),Define sigma as the MFE function for the simple cases, where, for example a base pairs score is less than a non-pair,Consider 4 recursion possibilities:,i,j,are a base pair, added to the structure for,i+1.j-1,Define this as,V(i,j,),i is unpaired, added to the structure for,i+1.j,j is unpaired, added to the structure for,i.j-1,i,j,are paired, but not to each other; the structure for,i.j,adds together sub-structures for 2 sub-sequences:,i.k,and,k+1.j,a bifurcation (ik more predicted targets,Membership in large,miRNA,family - more predicted targets,miRNA,Targets - Not the end of the story,Many programs claim to discover,miRNA,targets in mammals:,miRanda,-,Enright,et al, SKI,DIANA-,MicroT,-,Hatzigeorgiou,et al,UPenn,rna22 -,Rigoutsos,et al, IBM,PicTar,-,Rajewsky,et al, NYU,Problem 4 conclusion: algorithms comparison,NYAS Competition (Feb 17, 2005),Task: given 2 miRNAs, find mammalian targets,Widely differing results, from 1 target to 500 targets!,Very little overlap,So whos “right”?,Currently correct targets unknown,Thank You,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 小学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!