资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,L,a,n,g,u,a,g,e,T,e,c,h,n,o,l,o,g,i,e,s,I,n,s,t,i,t,ut,e,C,a,r,n,e,g,i,e,M,e,l,l,o,n,U,n,i,v,e,r,s,i,t,y,*,R,i,c,h,a,r,d,C,.,W,a,n,g,Automatic Set Expansion for List Question Answering,Richard C.Wang,Nico Schlaefer,William W.Cohen,and Eric Nyberg,Language Technologies Institute,Carnegie Mellon University,Pittsburgh,PA 15213 USA,1,Task,Automatically improve answers generated by,Question Answering,systems for,list questions,by using a,Set Expansion,system.,For example:,Name cities that have Starbucks.,QA Answers,Expanded Answers,Boston,Seattle,Carnegie-Mellon,Aquafina,Google,Logitech,Seattle,Boston,Chicago,Pittsburgh,Carnegie-Mellon,Google,Better!,2,Outline,Introduction,Question Answering,Set Expansion,Proposed Approach,Aggressive Fetcher,Lenient Extractor,Hinted Expander,Experimental Results,QA System:Ephyra,Other QA Systems,Conclusion,3,Question Answering(QA),Question Answering,task:,Retrieve answers to natural language questions,Different question types:,Factoid questions,List questions,Definitional questions,Opinion questions,Major QA evaluations:,Text REtrieval Conference(TREC):English,NTCIR:Japanese,Chinese,CLEF:European languages,4,Typical QA Pipeline,QuestionAnalysis,Query Generation,&Search,CandidateGeneration,AnswerScoring,Knowledge,Sources,Question String,Analyzed Question,Search Results,Candidate Answers,Scored Answers,The two original text,smileys,were invented,on,September 19,1982,by,Scott E.Fahlman,.,smileys,September 19,1982,Scott E.Fahlman,Candidate,Score,Scott E.Fahlman,0.853,smileys,0.418,September 19,1982,0.239,“Who invented the smiley?,Answer type:PersonKeywords:invented,smiley,.,5,QA System:Ephyra,(Schlaefer et al.,TREC 2007),History:,Developed at University of Karlsruhe,Germany and Carnegie Mellon University,USA,TREC participations in 2006(13th out of 27 teams)and 2007(7th out of 21 teams),Released into open source in 2021,Different candidate generators:,Answer type classification,Regular expression matching,Semantic parsing,Available for download at:,6,Outline,Introduction,Question Answering,Set Expansion,Proposed Approach,Aggressive Fetcher,Lenient Extractor,Hinted Expander,Experimental Results,QA System:Ephyra,Other QA Systems,Conclusion,7,Set Expansion(SE),For example,Given a query:“survivor,“amazing race,Answer is:“american idol,“big brother,.,More formally,Given a small number of seeds:x1,x2,xk where each xi St,Answer is a listing of other probable elements:e1,e2,en where each ei St,A well-known example of a web-based set expansion system is Google Sets,8,SE System:SEAL,(Wang&Cohen,ICDM 2007),Features,Independent of human/markup language,Support seeds in English,Chinese,Japanese,Korean,.,Accept documents in HTML,XML,SGML,TeX,WikiML,Does not require pre-annotated training data,Utilize readily-available corpus:World Wide Web,Based on two research contributions,Automatically construct wrappers for extracting candidate items,Rank extracted items using random graph walk,Try it out for yourself:/rcwang /seal,9,SEALs SE Pipeline,Fetcher,:downloads web pages from the Web,Extractor,:learns wrappers from web pages,Ranker,:ranks entities extracted by wrappers,Canon,Nikon,Olympus,Pentax,Sony,Kodak,Minolta,Panasonic,Casio,Leica,Fuji,Samsung,10,Challenge,SE systems require relevant(non-noisy)seeds,but answers produced by QA systems are often noisy.,How can we integrate those two systems together?,We propose three extensions to SEAL,Aggressive Fetcher,Lenient Extractor,Hinted Expander,11,Outline,Introduction,Question Answering,Set Expansion,Proposed Approach,Aggressive Fetcher,Lenient Extractor,Hinted Expander,Experimental Results,QA System:Ephyra,Other QA Systems,Conclusion,12,Original Fetcher,Procedure:,Compose a search query by concatenating,all seeds,Use Google to request top 100 web pages,Fetch web pages and send to the Extractor,Seeds,Boston,Seattle,Carnegie-Mellon,Query,Boston Seattle,Carnegie-Mellon,13,Proposed Fetcher,Aggressive Fetcher(AF),Sends a,two-seed,query for every possible pair of seeds to the search engines,More likely to compose queries containing,only,relevant seeds,Seeds,Boston,Seattle,Carnegie-Mellon,Queries,Boston Seattle,Boston,Carnegie-Mellon,Seattle,Carnegie-Mellon,14,Outline,Introduction,Question Answering,Set Expansion,Proposed Approach,Aggressive Fetcher,Lenient Extractor,Hinted Expander,Experimental Results,QA System:Ephyra,Other QA Systems,Conclusion,15,Original Extractor,A wrapper is a pair of,L,and,R,context string,Maximally-long contextual strings that bracket at least one instance of,every,seed,Extracts strings between,L,and,R,Learn wrappers from,web pages,and,seeds,on the fly,Utilize,semi-structured,documents,Wrappers defined at,character level,No tokenization required(language-independent),However,very page specific(page-dependent),16,17,Proposed Extractor,Lenient Extractor(LE),Maximally-long contextual strings that bracket at least one insta
展开阅读全文