资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,Lecture4 Retrieval modelsl,Lecture 4,Retrieval models,Part 2,Reference,James Allan,University of Massachusetts Amherst,Gerald Benoit,Simmons College,Lecture4 Retrieval modelsl,1,Lecture4 Retrieval modelsl,Models wel consider,Boolean(exact match,Statistical language models,Vector space,Latent Semantic Indexing,this lecture,eft to latter lectures,Lecture4 Retrieval modelsl,2,Lecture4 Retrieval modelsl,Vector Space Model,Variations(不同形式):,Vector space retrieval model(向量空间模型),Latent Semantic Indexing(潜层语义索引),Key idea,Everything(documents,queries,terms)is a vector in a high,dimensional space,Example systems,SMART,G.Salton and students at Cornell starting in the 60s,Lucene,popular open source engine written in Java,still be a building block of many commercial SEs,Most Web search engines are similar,Lecture4 Retrieval modelsl,3,Lecture4 Retrieval modelsl,Vector space issues,Terms,Document:,Queries,How to select magnitude(幅值)along a dimension,How to compare objects in vector space,Comparing queries to documents,Lecture4 Retrieval modelsl,4,Lecture4 Retrieval modelsl,Vector Space and Basis vectors,Formally,a vector space is defined by a set of linearly,independent(线性独立)basis vectors,Basis vectors,correspond to the dimensions or directions in the vector space,determine what can be described in the vector space;and,must be orthogonal(Ex),or linearly independent,i.e.a va,along one dimension implies nothing about a value along alue,anotner,Basis vectors,for 2 dimensions,for 3 dimensions,Lecture4 Retrieval modelsl,5,Lecture 4 Retrieval models-Il,Selection of Basic Vector,What should be the basis vectors for iR?,Core concepts of discourse?,orthogonal(by definition),a relatively static vector space,probably not too many dimensions,But.difficult to determine(Philosophy Cognitive science?,Use terms that appear?,easy to determine,But,lot at all orthogonal(but it may not matter much),a constantly growing vector space(new vocabulary),huge number of dimensions,6,Lecture 4 Retrieval models-Il,6,Lecture4 Retrieval modelsl,Mapping to basis vectors:terms,How do basis vectors relate to terms?,Each term is represented as a linear,combination of basis vectors,Dictionary,Basis terms,Term Active Independent,0.75,Independen,cat=0.25 Active+0.75 Independent(or cat=0.25 X+0.75 y),dog=0.75+0.25y,Lecture4 Retrieval modelsl,7,Lecture 4 Retrieval models-Il,Mapping to basis vectors:documents,How are documents represented?,a document is represented as the sum of its,term vectors,Docl,de,8,Lecture 4 Retrieval models-Il,8,Lecture4 Retrieval modelsl,Mapping to basis vectors:queries,How are queries represented,Same way that documents are,cat,Independ,Lecture4 Retrieval modelsl,9,Lecture4 Retrieval modelsl,Vector Coefficients,The coefficients(vector lengths,term weights)represent,term presence,importance,or aboutness,Magnitude along each dimension,Model gives no guidance on how to set term weights,Some common choices,Binary:1=term is present,0=term not present in document,tf:The frequency of the term in the document,tf idf.idf(inverse document frequency indicates the,discriminatory power(辨识能力)of the term,Tf idf is far away from the most common,Numerous variations,Lecture4 Retrieval modelsl,10,网络信息检索-检索模型详解课件,11,网络信息检索-检索模型详解课件,12,网络信息检索-检索模型详解课件,13,网络信息检索-检索模型详解课件,14,网络信息检索-检索模型详解课件,15,网络信息检索-检索模型详解课件,16,网络信息检索-检索模型详解课件,17,网络信息检索-检索模型详解课件,18,网络信息检索-检索模型详解课件,19,网络信息检索-检索模型详解课件,20,网络信息检索-检索模型详解课件,21,网络信息检索-检索模型详解课件,22,网络信息检索-检索模型详解课件,23,网络信息检索-检索模型详解课件,24,网络信息检索-检索模型详解课件,25,网络信息检索-检索模型详解课件,26,网络信息检索-检索模型详解课件,27,网络信息检索-检索模型详解课件,28,网络信息检索-检索模型详解课件,29,网络信息检索-检索模型详解课件,30,网络信息检索-检索模型详解课件,31,网络信息检索-检索模型详解课件,32,网络信息检索-检索模型详解课件,33,网络信息检索-检索模型详解课件,34,网络信息检索-检索模型详解课件,35,网络信息检索-检索模型详解课件,36,网络信息检索-检索模型详解课件,37,网络信息检索-检索模型详解课件,38,网络信息检索-检索模型详解课件,39,网络信息检索-检索模型详解课件,40,网络信息检索-检索模型详解课件,41,网络信息检索-检索模型详解课件,42,
展开阅读全文