Contextually-related Entities - Stanford University上下文相关的实体-斯坦福大学

上传人:e****s 文档编号:252769986 上传时间:2024-11-19 格式:PPT 页数:38 大小:509.50KB
返回 下载 相关 举报
Contextually-related Entities - Stanford University上下文相关的实体-斯坦福大学_第1页
第1页 / 共38页
Contextually-related Entities - Stanford University上下文相关的实体-斯坦福大学_第2页
第2页 / 共38页
Contextually-related Entities - Stanford University上下文相关的实体-斯坦福大学_第3页
第3页 / 共38页
点击查看更多>>
资源描述
Click to edit Master title style,*,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Linguistics 187: Grammar Engineering,Ron Kaplan, Tracy King, Martin Forst,Administrivia,Schedule: Office hours,Requirements,Overview,Semantic Search:,Powerset,Hakia,Applications of Language Engineering,Functionality,Domain Coverage,Low,Narrow,Broad,High,Deep,Shallow,Synthesis,Keyword Search:,Google,Yahoo,Microsoft Live,Post-Search,Sifting,AutonomousKnowledge Filtering,NaturalDialogue,Microsoft Paperclip,Manually-tagged Keyword Search,Document BaseManagement,Restricted,Dialogue,Useful Summary,Good Translation,Grammar engineering for deep processing,Draws on theoretical linguistics, software engineering,Theoretical linguistics = papers,Generalizations, universality, idealization (competence),Software engineering = programs,Coverage, interface, QA, maintainability, efficiency, practicality,Grammar engineering,Grammar:Theory = Program:Programming language,Reflect linguistic generalizations,Respect special cases of ordinary language,Deal with large-scale interactions,Theory/practice trade-offs,What is a shallow grammar,often trained automatically from marked up corpora,part of speech tagging,chunking,trees,POS tagging and Chunking,Part of speech tagging:,I,/PRP,saw,/VBD,her,/PRP,duck,/VB,./PUNCT,I,/PRP,saw,/VBD,her,/PRP$,duck,/NN,./PUNCT,Chunking:,general chunking,I begin with an intuition: when I read a sentence, I read it a chunk at a time. (Abney),NP chunking,NP,President Obama,visited,NP,the Hermitage,in,NP,Leningrad,Treebank grammars,Phrase structure tree (c-structure),Annotations for heads, grammatical functions,Collins parser output,Deep grammars,Provide detailed syntactic/semantic analyses,LFG (ParGram), HPSG (LinGO, Matrix),Grammatical functions, tense, number, etc.,Mary wants to leave.,subj,(want1,Mary3),comp,(want1,leave2),subj,(leave2,Mary3),tense,(want1,present),Usually manually constructed,linguistically motivated rules,Why would you want one,Meaning sensitive applications,overkill for many NLP applications, crucial for others,Applications which use shallow methods for English may not work for free word order languages,can read many functions off of trees in English,SUBJ,: NP sister to VP,S NP,Mary, VP,left,OBJ,: first NP sister to V,S NP,Mary, VP,saw,NP,John,need other information in German, Japanese, etc.,Deep analysis matters,if you care about the answer,Example:,A delegation led by Vice President Philips, head of the chemical division, f,lew to Chicago a week after the incident.,Question: Who flew to Chicago?,Candidate answers:,division,closest noun,head,next closest,V.P.,Philips,next,shallow but wrong,delegation,furthest away but,Subject of,flew,deep and right,Search: Keywords to natural language,Suppose you want to know who Obama criticized.,With shallow keyword search engines:,Keywords: “Obama criticized,Simple to use, but,Precison errors,Hillary and John criticized Barack, interesting (maybe) but irrelevant,Recall errors,What about denounce, condemn,Advanced search: More expressive, but complex and unused,(“Obama (criticize OR condemn OR ),Compensate with web graph and other ranking features,Who did Obama criticize?,Who did Obama criticize?,Who criticized Obama?,from,subj,by,Sir Edward Heath (name),pneumonia (noun),die (verb),Sir Edward Heath died from pneumonia .,Sir Edward Heath (noun),UK Prime Minister,politician,Parses each sentence on the page,Extracts entities & semantic relationships,Identifies and expands to similar entities, relationships & abstractions,Indexes multiple facts for each sentence,Semantic search (Powerset),disease,killed,Mapping Queries to Content,Edward Heaths death,death of Edward Heath,disease that killed Edward Heath,diseases that killed politicians,politicians who died from disease,politicians that died from pneumonia,politicians killed by pneumonia,who died from pneumonia,what politicians died from disease,which politician died from pneumonia,what disease did Edward Heath die from,what killed Sir Edward Heath,what was Sir Edward Heath killed by,Sir Edward Heath died from pneumonia at 19:30 on 17 July 2005,Acquisition:,manual + ML,Open-textcontent,NLquestions,ContentSemantics,Content Acquisition,User search,Ranking,XLE parse,Semantic map,XLE parse,Semantic map,Indexing,Query,Resultpresentation,Large-scalesemantic index,Retrieval,Who did IBM acquire in the last 10 years?,IBM purchased Lotus in 1998.,QuestionSemantics,Knowledge Resources,LFG Grammar,Doc1: IBM purchased Lotus in 1998.Doc2: List of IBM purchases,Traditional Problems,Time consuming and expensive to write,Robustness,want output for any input: real-world applications,Ambiguity,Efficiency,Interfaces to other application components,Why deep analysis is difficult,Languages are,hard to describe,Meaning depends on complex properties of words and sequences,Different languages rely on different properties,Errors and disfluencies,Languages are,hard to compute,Expensive to recognize complex patterns,Sentences are ambiguous,Ambiguities multiply: explosion in time and space,How to overcome this,Engineer the deep grammars,theoretical vs. practical,what is good enough,Integrate shallow techniques into deep grammars,Experience based on broad-coverage LFG grammars (ParGram project),Robustness: Sources of Brittleness,missing vocabulary,you cant list all the proper names in the world,missing constructions,there are many constructions theoretical linguistics rarely considers (e.g. dates, company names),easy to miss even core constructions,ungrammatical input,real world text is not always perfect,sometimes it is really horrendous,Real world Input,Other weak blue-chip issues included Chevron, which went down 2 to 64 7/8 in Big Board composite trading of 1.3 million shares; Goodyear Tire & Rubber, off 1 1/2 to 46 3/4, and American Express, down 3/4 to 37 1/4. (WSJ, section 13),The croakers done gone from the hook (WSJ, section 13),(SOLUTION 27000 20) Without tag P-248 the W7F3 fuse is located in the rear of the machine by the charge power supply (PL3 C14 item 15. (Eureka copier repair tip),Missing vocabulary,Build vocabulary based on the input of shallow methods,fast,extensive,accurate,Finite-state morphologies,Part of Speech Taggers,LFG and XLE: This course,LFG: a theory of grammar,XLE: a parsing/generation engine for LFG grammars,English,Group, order,Japanese,Group, mark,The small children are chasing the dog.,P,ga,Sbj,S,NP,N,Adj,NP,tiisaismall,kodomotatichildren,N,inudog,V,oikaketeiruare chasing,o,Obj,P,Different patterns code same meaning,S,NP,N,Adj,Det,V,NP,the,small,Aux,children,Det,the,N,dog,are,V,chasing,English,Group, order,Japanese,Group, mark,The small children are chasing the dog.,P,ga,Sbj,S,NP,N,Adj,NP,tiisaismall,kodomotatichildren,N,inudog,V,oikaketeiruare chasing,o,Obj,P,Different patterns code same meaning,S,NP,N,Adj,Det,V,NP,the,small,Aux,children,Det,the,N,dog,are,V,chasing,Warlpiri,Mark only,S,NP,N,NP,witajarra,rlu,small-,Sbj,mali,ki,dog-,Obj,N,kurdujarra,rlu,children-,Sbj,V,wajilipinyichase,Aux,kapalaPresent,NP,A,Chase(small(children), dog),Pred,chase,Subj,Obj,Tense,Present,Pred,Mod,children,small,Pred,dog,LFG theory: minor adjustments on universal theme,LFG architecture,C-structures and f-structures in piecewise correspondence.,NP,John,VP,NP,Mary,f,V,likes,S,Formal encoding of grammatical relations,Formal encoding of order and grouping,Modularity,SUBJ,PRED John,NUM SG,TENSE,PRESENT,PRED Mary,NUM SG,OBJ,PRED,like,LFG,grammar,Rules,S,NPVP(,SUBJ)=,=,Lexical entries,John:NP (,PRED)=John (,NUM)=SG,likes:V (,PRED)=like (,SUBJ NUM)=SG,(,SUBJ PERS)=3,Context-free rules define valid c-structures (trees).,Annotations are instantiated at tree nodes to give equational constraints that corresponding f-structures must satisfy.,Satisfiability of constraints determines grammaticality.,F-structure is solution for equations (if satisfied).,VP,VNP,=,(,OBJ)=,Rules as well-formedness conditions,S,NP,(, SUBJ)=,VP,=,S,NP,VP,SUBJ ,A tree containing S over NP - VP is OK if,F-unit corresponding to NP node is SUBJ of f-unit corresponding to S node,The same f-unit corresponds to both S and VP nodes.,s,s,s,be the f-structure of the Subject,f,f,Let,f,be the f-structure of the sentence,(,f,SUBJ NUM)=PL and (,f,SUBJ NUM)=SG,= SG=PL,=,FALSE,v,v,be the f-structure of the verb,v,NP(, SUBJ)=,walk,s,(, SUBJ NUM)=SG,S,VP,=,they(, NUM)=PL,Inconsistent equations = Ungrammatical,Whats wrong with They walk,s,?,(,f,SUBJ) =,s,and (,s,NUM)=PL = (,f,SUBJ NUM)=PL,Then (substituting equals for equals):,f,=,v,and (,v,SUBJ NUM)=SG = (,f,SUBJ NUM)=SG,If a valid inference chain yields FALSE,the premises are unsatisfiable.,Pargram project,Large-scale LFG grammars for several languages,English, German, Japanese (Korean), French, Norwegian, Chinese, Turkish, Arabic, Hungarian,Cover real uses of language-newspapers, documents, etc.,Parallelism: test LFG universality claims,Common c- to f-structure mapping conventions,(unless typologically motivated variation),Invariant underlying f-structures,Permits shared disambiguation properties, Glue interpretation premises,All grammars run on PARC software (XLE),International consortium of linguists,PARC, Stuttgart, Fuji Xerox, Konstanz, Bergen, Sabanci, Oxford, Oman,Sustained effort-full-week meetings twice a year10 years!,Contributions to linguistics and computational linguistics: books and papers,Each group is self-funded, self-managed,History,Started in 1994,English (PARC),French (XRCE, now PARC),German (IMS-Stuttgart),Biannual meetings,Alternating between Palo Alto and Europe/Japan,1998: Japanese started (Fuji Xerox),1999: Norwegian started (University of Bergen),2000: Urdu (Konstanz),2002: Danish started (Copenhagen),2003: Korean (PARC) porting experiment,2004: Welsh, Malagasy (Essex, Oxford) Chinese (PARC),2005: Arabic (Oman), Turkish (Sabanci), Hungarian,Goals,Practical,Create a capability/platform for NL applications,translation, information retrieval, .,Develop discipline of grammar engineering,what tools, techniques, conventions make it easy to develop and maintain broad-coverage grammars?,how long does it take?,how much does it cost?,Theoretical,Refine and guide LFG theory through broad coverage of multiple languages,Refine and guide the algorithms and implementation (XLE),Parallel f-structures (where possible),but different c-structures,Pargram grammars,German,English*,French,Japanese (Korean),#Rules,251,388,180,56,#States,3,239,13,655,1,747,368,#Disjuncts,13,294,55,725,12,188,2,012,* English allows for shallow markup: labeled bracketing, named-entities,Engineering results,Grammars and Lexicons,Parallel f-structures across languages,Grammar writers cookbook,New practical formal devices,Complex categories for efficiency,NPnom vs. NP: (,CASE) = NOM,Optimality marks for robustness,enlarge grammar without being overrun by peculiar analyses,Lexical priority: merging different lexicons,11/19/2024,Theoretical results,New theory of agreement features,Separate representation of morphosyntactic features,Phonology-syntax interface,New analysis of nonconstituent coordination,Distribution instead of generalization over sets,XLE Demo,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 商业计划


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!