资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,2008-5-23,Knowledge Engineering Group,Tsinghua University,*,A Mixture Model for Expert Finding,Jing Zhang,Jie Tang,Liu Liu,and Juanzi Li,Tsinghua University,2021-5-23,2021-5-23,1,Outline,Motivation,Related Work,Our Approach,Experiments,Conclusion,2021-5-23,2,Introduction,Expert Finding aims at answering the question:“Who are experts on topic X?,The task is very important,because we usually want to:,find the important scientists on a research topic,find the most appropriate collaborators for a project,find an expertise consultant,2021-5-23,3,Motivation,Semantic web,Integrating,ecoinformatics,resources on the semantic web.In Proceedings of WWW2006,A Semantic Web Services Architecture.IEEE Internet Computing,2005,Timothy W.Finin,Support vector machine,Vladimir Vapnik,A Support Vector Clustering Method.In Proceedings of ICPR2000,Boosting and Other Machine Learning Algorithms.In Proceedings of ICML1994,Natural language processing,A Pipeline Framework for Dependency Parsing.In Proceedings of ACL2006,Probabilistic Reasoning for Entity Relation Recognition.In Proceedings of COLING2002,Dan Roth,Language Model,Language Model,Language Model,emphasizes the occurrence of query terms in the support documents.,Question,:,1.How to discover the relationships of words in a semantic level?,2.How to use the relationships to improve the performance of expert finding?,2021-5-23,4,Outline,Motivation,Related Work,Our Approach,Experiments,Conclusion,2021-5-23,5,Related Work,Language Model for Expert Finding,TREC 2005 and TREC 2006,Find the associations between candidates and documents,E.g.Cao(2005),Fu(2005),Balog(2006),Advanced model,Study expert finding in a sparse data environment,E.g.Balog(2007),An overview of most of the models,Analyze and compare different models for expert finding,Probabilistically equivalent and differences lie in independent assumptions,E.g.Petkova,2007,2021-5-23,6,Related Work,Probabilistic latent semantic analysis(PLSA),Discover latent semantic structure,Assume hidden factors underlying the co-occurrences among two sets of objects,PLSA applications,Information retrieval,Hofmann 1999,Text learning and mining,Brants,2002,Gaussier,2002,Kim,2003,Zhai,2004,Co-citation analysis,Cohn,2000,Cohn,2001,Social annotation analysis,Wu,2006,Web usage mining,Jin,2004,Personalize web search,Lin,2005,2021-5-23,7,Outline,Motivation,Related Work,Our Approach,Experiments,Conclusion,2021-5-23,8,Overview,term,doc,theme,Language model,Our approach,PLSA,term,doc,2021-5-23,9,Problem Setting,What is the task of expert finding?,Given,e,:an expert,q,:a query,Estimate,p,(,e,|,q,),Assuming,p,(,q,)is uniform:,We focus on:Query-dependent probability,Query-independent probability,2021-5-23,10,Language Models for Expert Finding,Expert finding target,:,estimate,p,(,q,|,e,),D,e,=,d,j,:Support documents related to a candidate,e,extend by two ways,1,2,Composite model,Hybrid model,co-occurrence of all the query terms in the same document,1:,e,is the author of,d,j,0:otherwise.,co-occurrence of all the query terms in all the support document of an expert,2021-5-23,11,Language Model for Document Retrieval,Language model describes the relevance between a document,d,and a query,q,as the generating probability,Assume terms appear independently in the query:,P,(,t,i,|,d,)is estimated by maximum likelihood estimation and Dirichlet smoothing:,2021-5-23,12,A Mixture Model for Expert Finding,Language models need calculate,p,(,t,i,|,d,j,),We assume,k,hidden themes,=,1,2,k,between term,t,i,and document,d,j,t,1,t,2,t,n,d,1,d,2,d,m,1,2,k,p,(,d,),p,(,m,|,d,),p,(,t,|,m,),2021-5-23,13,A Mixture Model for Expert Finding,Based on the generative process,we define a joint probability model:,With Bayes formula,we get:,In order to explain the observations(,t,d,),we need to maximize the log-likelihood function by the given parameters:,where,n,(,d,t,)denotes the co-occurrence times of,d,and,t,.,2021-5-23,14,A Mixture Model for Expert Finding,We use EM to estimate the maximum likelihood.,E-step:we aim to compute the posterior probability of latent theme,m,based on the current estimates of the parameters,M-step:we aim to maximize the expectation of the log-likelihood of Equation,2021-5-23,15,A Mixture Model for Expert Finding,p,(,t,|,m,),p,(,d,|,m,),p,(,m,),We rank experts based on the estimated parameters:,2021-5-23,16,Language Models for Expert Finding,Composite model,works well for a support document containing all the query terms.,Hybrid model,is more flexible,it works well for all the query terms are in all the support documents,The two models are based on keyword-matching,they can not work well for the support documents containing no query terms.,Semantic,web,Integrating,ecoinformatics,resources on the semantic web.In Proceedings of WWW2006,A Semantic Web Services Architecture.IEEE Internet Computing,2005,Timothy W.Finin,Vladimir Vapnik,A Support Vecto
展开阅读全文