资源描述
,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,XML,代价估计方法研究综述,XML Group,XML 代价估计方法研究综述,1,Outline,研究代价估计的必要性,xml,中代价估计研究的难点,背景知识介绍,现有方法分类,Outline研究代价估计的必要性,2,研究代价估计的必要性,查询优化的基础,不同分支对查询目标的选择率不同,如果选择性低的分支先于选择性高的分支执行,就会有效的减少中间结果从而提供查询效率,结构连接在,XML,中是一个很重要的操作,连接的顺序的选择需要计算不同连接顺序的代价,及早给用户提供反馈信息,研究代价估计的必要性查询优化的基础,3,xml,中代价估计研究的难点,XML,数据的不规则性是对传统统计信息方法的重要挑战,具体表现在以下几个方面:,路径依赖性,,如同为,name,结点,在,person,下和在,city,下出现意义就完全不同,结构相关性,嵌套在不同祖先下面的同类结点的个数差别可能很大,如,book,结点下的,author,个数是不确定的,值相关性,/purchase/Itemtype=bookprice100,XML,的有序性制约了转换规则的灵活性,所有这些问题,都使得在,xml,中采用传统的代价估计方法不切实际,会带来很大的误差。针对,xml,数据的特点,我们应该寻求一种新的代价估计方法。,xml中代价估计研究的难点XML数据的不规则性是对传统统计信,4,Background Knowledge,XML Data Model,A large,node-labeled tree T(V,E),d,0,a,1,a,2,a,3,p,4,p,5,n,6,t,14,k,15,y,13,1999,y,16,t,17,k,18,k,19,n,7,b,9,p,8,y,20,t,21,k,22,y,24,t,25,k,26,n,10,b,12,p,9,t,23,t,26,2002,2001,2000,Example XML Document Tree,Background Knowledge XML Data,5,Background Knowledge,XML Data Model,A large,node-labeled tree T(V,E),d,0,a,1,a,2,a,3,p,4,p,5,n,6,t,14,k,15,y,13,1999,y,16,t,17,k,18,k,19,n,7,b,9,p,8,y,20,t,21,k,22,y,24,t,25,k,26,n,10,b,12,p,9,t,23,t,26,2002,2001,2000,Example XML Document,Background Knowledge XML Data,6,Background Knowledge,XML Query Model,A node-labeled query tree TQ,Each node of T,Q,is labeled with a variable name q,i,in Q,Each edge(q,i,q,j,)of T,Q,is annotated with an XPath expression path(q,i,q,j,)that describes the specific structural constraints specified in Q,Query Fragment,:,for$b in doc(“bib.xml”)/bib/book,$a in$b/author,$t in$b/title,bib,book,author,title,root,$b,$a,$t,Twig Query Model,Background KnowledgeXML Query,7,现有估计方法的分类,路径选择性计算,Twig,匹配个数统计,值谓词选择率估计,结构连接代价估计,现有估计方法的分类路径选择性计算,8,Overview,data tree,different summarization methods based on:,path,all,m,-length path,F/B-stability,Lore,McHugh et al,VLDB99,pruning,PathTree,Markov Table,Aboulnaga et al,VLDB01,XSketch,Polyzotis et al,VLDB02,t,ype,XSketches,Polyzotis et al,SIGMOD02,+value,+,twig,XSketches,Polyzotis et al,SIGMOD0,4,Region Code,2D Model,1D Model,StatiX,Freire et al,SIGMOD02,PH Histogram,Wu Y EDBT 2002,Interva&Position Model,H.Lu,SIGMOD 2003,Overviewdata treedifferent sum,9,路径选择性计算,问题描述,/,a/bc/d/eg/e/f/*/*/e/f,路径选择性计算问题描述 /a/bc/d/eg/,10,Path Tree&,Markov Table,Ashraf Aboulnaga,Alaa R.Alameldeen,and Jeffry F.Naughton.Estimating the selectivity of XML path expressions for Internet scale applications.VLDB 2001,Path Tree&Markov Table Ashra,11,Path Tree,Construction,Start from an original path tree,Count of nodes in,absolute paths,Aggregate the tree to fit in the limited space,Sibling-*,Estimation,Match the query against the path tree,matching*as the last resort.,Need to decide whether to use total count or average count.,Path TreeConstruction,12,An XML document and its path tree,A,1,B,2,C,1,D,1,D,1,E,3,Estimation:count(A/B/D)=1,count(A/C/D)=1,An XML document and its path t,13,Aggregate Path Tree,A,1,C,9,B,13,G,10,F,15,H,6,K,12,E,5,D,7,K,11,J,4,I,2,Aggregate Path TreeA1C9B13G10F,14,Aggregate Path Tree,A,1,C,9,B,13,G,10,F,15,H,6,K,12,E,5,D,7,K,11,J,4,I,2,红色结点表示那些不频繁出现的结点,需要删除,从而保证,path,树能够放入内存,Aggregate Path TreeA1C9B13G10F,15,Sibling-*Summarization,A,1,C,9,B,13,*,F,15,*,K,*,f=23,n=2,6,8,3,把查询和,path,树匹配,需要决定用总数还是平均数,估计方法,Estimation:count(/C/H/K)=11,count(/C/*/K)=23,Sibling-*SummarizationA1C9B13,16,Markov Table,构造一个表,存储长度=2。,当查询路径长度=,m,时,可以直接从表中读出结果,否则,用公式计算。,例如,,m3,,查询为/,A/B/C/D,,则,当表不能装入内存的时候,进行剪枝操作。,f(B/C/D),f(B/C),f(A/B/C/D),f(A/B/C),Markov Table构造一个表,存储长度=m的不同路径,17,Markov Table:Example,a,b,b,c,d,d,e,e,e,f,d,a,1,a/b,2,b,2,a/c,1,c,1,b/d,2,d,3,c/d,1,e,3,c/e,3,f,1,c/f,1,a,b,d,d,e,1,2,2,1,3,c,f,1,1,m=2,b,2,c/e,3,d,3,*,3/3,e,3,c/*,2/2,a/b,2,*,/*,1/1,b/d,2,suffix-*,Q1:a/c/e,Markov Table:Exampleabbcdde,18,Twig,匹配个数统计,问题描述,Twig Query(basic building block of declarative query languages for XML),FOR$f IN document(“person.xml”)/department/faculty,FOR$r in$f/RA,FOR$t in$f/TA,Department,Faculty,RA,TA,$f,$r,$t,Twig 匹配个数统计问题描述FOR$f IN docum,19,XSketch,方法,N.Polyzotis,M.Garofalakis.Statistical Synopses for Graph-Structured XML Databases,In Proc.of ACM SIGMOD Conf.2002,N.Polyzotis and M.Garofalakis.Structure and value synopses for xml data graphs.In Proceedings of 28,th,VLDB Conf.,2002.,N.Polyzotis,M.Garofalakis,and Yannis Iosnnidis.Selectivity Estimation for XML Twigs.In Proc.of the 20th Intl.Conf.on Data Engineering,2004,N.Polyzotis and M.Garofalakis and Y.Ioannidis.Approximate XML Query Answers.In Proc.ACM SIGMOD 2004,XSketch方法 N.Polyzotis,M.Garof,20,d,0,a,1,a,2,a,3,p,4,p,5,n,6,t,14,k,15,y,13,1999,y,16,t,17,k,18,k,19,n,7,b,9,p,8,y,20,t,21,k,22,y,24,t,25,k,26,n,10,b,12,p,9,t,23,t,26,2002,2001,2000,D(1),A(3),N(3),P(4),B(2),Y(4),K(6),T(6),H(Y),B/F,B/F,B/F,B/F,B,B/F,F,F,XSketch,方法,d0a1a2a3p4p5n6t14k15y131999y16,21,B/F stability,Definition,:,Let U,V be sets of elements in an XML data Tree T.We say that V is backward-stable(B-stale)with respect to U,iff for each vV there exists a u U such tha
展开阅读全文