MicroRNA ppt Collection

资源描述

Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,MicroRNA,The Computational Challenge,Bioinformatics Seminar, March 9, 2005,By Yaron Levy,Tree of RNA Types,miRNA,Biological Process,Micro RNA Computational Approach,Problem 1: Finding putative,microRNA,from a sequence,Horesh,et al, using suffix trees data structure,Problem 2: Computing secondary structure of a given sequence,Zuker,&,Steigler, minimum free energy, using dynamic programming,Problem 3:,miRNA,predicting algorithms,Lim et al,MiRscan,Problem 4: Predicting,miRNA,target genes,Lewis et al,TargetScan,Problem 1,Find these,Problem 1: Finding putative,microRNA,from a sequence,A nave idea: slide a “window” of size L over the sequence of size N, looking for stems of size S.,Computationally O(NL+NS) too much,A better approach using a suffix tree.,S,= M A L A Y A L A M $,1 2 3 4 5 6 7 8 9 10,$,YALAM$,M,$,ALAYALAM$,$M,YALAM$,$M,YALAM$,$M,YALAM$,A,AL,LA,6,2,8,4,7,3,1,9,5,10,What is a suffix tree?,Suffix tree properties,For a string,S,of length,n, there are,n,leaves and at most,n,internal nodes.,therefore requires only linear space,Each leaf represents a unique suffix.,Concatenation of edge labels from root to a leaf spells out the suffix.,Each,internal node,represents a distinct common,prefix,to at least two suffixes.,Finding a (short),Pattern,in a (long),String,Build a suffix tree of the string.,Starting from the root, traverse a path matching characters of the pattern.,If stuck, pattern not present in string.,Otherwise, each leaf below gives a position of the pattern in the string.,Find “ALA”,$,YALAM$,M,$,ALAYALAM$,M$,YALAM$,M$,YALAM$,M$,YALAM$,A,AL,LA,6,2,8,4,7,3,1,9,5,10,Two matches - at 6 and 2,Finding a Pattern in a String,Generalized Suffix Tree,$,O,ND,W,I,$OG,D,$OGI,OW$,$OG,ND,$OGI,OW$,$OGI,OW$,$W,$,INDOW$,$,(2, 3),(1, 4),(2, 5),(2, 4),(2, 1),(1, 2),(2, 2),(1, 3),(1, 5),(2, 6),(1, 6),(1, 1),(1, 7),(2, 7),WINDOW$ INDIGO$,1234567 1234567,Horesh,et al using a generalized suffix tree for finding putative,microRNAs,Assumptions:,At least a triple repeat is necessary:,2 for the stems of the hairpin close to each other in the sequence, and as inverted repeat of each other,The rest are target genes can be anywhere,The repeats must be fully matched no mismatches are allowed,This is more of a constraint,Horesh,et al the algorithm,Construct a generalized suffix tree of the original sequence and the inverted repeat sequence.,Preprocess the suffix tree for calculating:,Length of suffixes,Number of repeats,Index of suffix in sequence,With these attributes for each node, along with the indices of the suffixes in the sequence, it is possible to find the requested triple (or more) repeats.,Computationally efficient O(N),a,banana,na,na,na,na,1. Build a suffix tree,2. Scan the tree in a,PreOrder,traversal,(all parents are visited before their sons),The length of a prefix a node represent is:,node.len,=,father.len,+,node.Length,of the sequence fragment it carries,(root is 0),0,1,3,2,2,4,1,3,5,6,a,banana,na,na,na,na,3. Scan the tree in a,PostOrder,traversal,(all sons are visited before their parents),The number of repeats of a prefix a node represent is:,node.repeats,= Sum of sons repeats (leaf is 1),6,3,2,2,1,1,1,1,1,1,4. Scan the tree again,For every node that represents a prefix longer than SIZE (22 for,example), and has two repeats or more;,Print its length and repeats and print the indexes of its leaves.,Now every node carries the,length,of the prefix,It represents and the number of leaves below it.,(the,number of repeats,it is their prefix).,All sections are done in linear time !,a,na,na,1,3,1,3,5,Problem 1 conclusions,The problem is not trivial!,Suffix trees are an elegant solution, providing:,No mismatches are allowed (not really biologically realistic),Enough memory to store the large data structure,Problem 2,How do these fold?,Problem 2: Computing secondary structure of a given sequence,Approaches to RNA secondary structure prediction:,comparative sequence analysis,prediction from base sequence,find,minimum free energy (MFE),structure,Free energy model,free energy of structure (at fixed temperature, ionic concentration) = sum of loop energies,standard model,uses experimentally determined thermodynamic parameters where available; extrapolations for long loops,Free energy model,free energy of structure (at fixed temperature, ionic concentration) = sum of loop energies,standard model,uses experimentally determined thermodynamic parameters where available; extrapolations for long loops,On the MFE approach,MFE approach ignores folding pathway, metal ions, nonstandard bonds,“some species can remain kinetically trapped in,nonequilibrium,states we expect that most RNAs exist naturally in their thermodynamically most stable configurations” ,Tinoco,and,Bustamante, J. Mol. Biol. 1999.,Why is MFE secondary structure prediction hard?,MFE structure can be found by calculating free energy of all possible structures,but, number of potential structures grows exponentially with the number, n, of bases,structures can be arbitrarily complex,success for restricted classes of structures,Predicting MFE pseudoknot free structures,Dynamic programming avoids explicit enumeration of all pseudoknot free structures (,Zuker,&,Stiegler,1981,),Suboptimal folds, probabilities of base pairings can also be calculated,software:,mfold, Vienna package,Dynamic programming (,Zuker,&,Steigler,),Based on the “more is less” principle: by calculating more than you need, less work is needed overall,Construct MFE structure for whole strand from MFE structures for,substrands,Running time is O(n,3,),RNA folding with dynamic programming,Assume a function,W(i,j,) which is the MFE for the sequence starting at i and ending at j (ij),Define sigma as the MFE function for the simple cases, where, for example a base pairs score is less than a non-pair,Consider 4 recursion possibilities:,i,j,are a base pair, added to the structure for,i+1.j-1,Define this as,V(i,j,),i is unpaired, added to the structure for,i+1.j,j is unpaired, added to the structure for,i.j-1,i,j,are paired, but not to each other; the structure for,i.j,adds together sub-structures for 2 sub-sequences:,i.k,and,k+1.j,a bifurcation (ik more predicted targets,Membership in large,miRNA,family - more predicted targets,miRNA,Targets - Not the end of the story,Many programs claim to discover,miRNA,targets in mammals:,miRanda,-,Enright,et al, SKI,DIANA-,MicroT,-,Hatzigeorgiou,et al,UPenn,rna22 -,Rigoutsos,et al, IBM,PicTar,-,Rajewsky,et al, NYU,Problem 4 conclusion: algorithms comparison,NYAS Competition (Feb 17, 2005),Task: given 2 miRNAs, find mammalian targets,Widely differing results, from 1 target to 500 targets!,Very little overlap,So whos “right”?,Currently correct targets unknown,Thank You,

展开阅读全文

MicroRNA ppt Collection

最新文档