资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,Instructor:ZOU lei,北京大学计算机科学技术研究所,Institute of Computer Science and Technology of Peking University,Graph Data Management,1,Outline,Applications and Challenges of Graph Data,Exiting Graph Database Systems,About the course,2,Outline,Applications and Challenges of Graph Data Management,Exiting Graph Database Systems,About the course,3,Graph Data,4,(a)Protein Network,(b)Social Network,Some Challenges in Large Graph Data Management,An Example,:,Considering a SNS website,there are more than 1 billion active users.,Query:,I want to know whether“Tom is a friend of Jack,or a friend of his friends?”,Possible Solutions:,(Storage)Store the connections between individuals in a relational table,(Query)Perform Self-join Recursively.,5,Some Challenges in Large Graph Data Management,6,recursive,queries,Network Motifs:Simple Building Blocks of Complex Networks(R.Milo,et al.SCIENCE03,),7,Network Motifs:Simple Building Blocks of Complex Networks(R.Milo,et al.SCIENCE03,),Network motifs,are patterns(sub-graphs)that recur within a network much more often than expected at random.Network motifs always correspond to some,functional patterns,in different networks.,Questions:,How to find such motifs efficiently?,Given a motif,how to find all embeddings of this motif efficiently?,8,26 November 2024,9,Frequent Subgraph Pattern Mining,(A),(B),(C),Graph Dataset,Frequent Patterns,(min support is 2),(1),(2),Subgraph Search,10,query graph,graph database,Query:,Which compounds contain“benzene ring”?,11,Reachablility Query,1,2,3,4,6,7,8,5,9,13,10,11,12,14,15,?Query(1,11),Yes,?Query(3,9),No,Shortest Path Distance Query,12,Whats the distance between two specified individuals?,RDF Data Management,The,Resource Description Framework (RDF),is a family ofWorld Wide Web Consortium(W3C)specifications originally designed as ametadatadata model.,13,WWW,Web of Pages,Semantic Web Web of Data,An RDF Data Example Yago Project,14,Structural Data,An RDF Data Example,15,An RDF Data Example,16,SPARQL Query,Query:,Find all individuals who were born on Feb.12,1809 and died on April.15,1865.,SPARQL Syntax,Select?name Where?m?name.?m “1809-02-12”.?m “1865-04-15”.,Query Graph,17,An RDF Data Example,18,Outline,Applications and Challenges of Graph Data,Exiting Graph Database Systems,About the course,19,Some Existing Graph Database Systems,The following is a list of several well-known graph database projects:,HyperGraphDB,-an open-source(LPGL)graph database supporting generalized,hypergraphs,where edges can point to other edges,InfoGrid,-an open-source/commercial(AGPLv3,free for small entities)graph database with web front end and configurable storage engines(MySQL,PostgreSQL,Files,Hadoop),20,Some Existing Graph Database Systems,Neo4j,-an open-source/commercial(AGPLv3)graph database,DEX,-A high-performance graph database,and so on,International Graph Database Workshops:,http:/ Example of Neo4j,Neo4j,http:/wiki.neo4j.org/content/The_Matrix,22,Finding friends of“Thomas Anderson”and,the friends of the friends too,Neo4j API-An Example,privatevoidprintFriends(Node person),Traverser traverser=person.traverse(,Order.BREADTH_FIRST,/Traverse,图的模式,StopEvaluator.END_OF_GRAPH,/Traverse,图的停止条件,ReturnableEvaluator.ALL_BUT_START_NODE,/,哪些图节点被返回,MyRelationshipTypes.KNOWS,/,按照那些边来进行,Traverse,Direction.OUTGOING);/Traverse,的方向,for(Node friend:traverser),System.out.println(friend.getProperty(name);,23,Outline,Applications and Challenges of Graph Data,Exiting Graph Database Systems,About the course,24,Course Content,Graph Mining,-frequent subgraph mining,Indexing&Query Processing,-reachablility query,-shortest path query,-subgraph query-keyword search,RDF Data Management,-Indexing&SPARQL Query Processing,25,课程网站,网址,:,http:/ Han&Micheline Kamber,著,范明,&,孟小峰 译,机械工业出版社(第二版),2.,MANAGING AND MINING GRAPH DATA,,,edited by CHARU C.AGGARWAL,HAIXUN WANG,Kluwer Academic Publishers,2009,3.,语义网基础,Grigoris Antoniou;Frank van Harmelen,著,机械工业出版社,2008,26,课程考核,课堂报告 (,30%,),每位学生报告一篇数据库领域,(,含数据挖掘,信息检索相关领域,),顶级论文,(,20,分钟,+5,分钟提问),作业(,30%,),3,项作业,完成,3,项指定的课题,课上表现(,10%,),27,课程考核,课程研修报告(,30%,):,课程研修报告包括两种形式,学生任选其一:,1,)文献综述型:介绍该课题的研究背景和相关已有工作。,并对不同已有研究结果给出自己的评论。,2,)论文型报告:鼓励学生就某个特定课题的从事创新性研究,并撰写论文。,28,课程目标,掌握图数据库的几种基本的查询算法和挖掘算法,了解图数据库技术在不同领域的应用情况,培养学生的独立思考和开展研究的能力。,29,30,Lets begin!,
展开阅读全文