资源描述
Click to edit Master title style,*,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Integrating Finite-state Morphologies with Deep LFG Grammars,Tracy Holloway King,FST and deep grammars,Finite state tokenizers and morphologies can be integrated into deep processing systems,Integrated tokenizers,eliminate the need for preprocessing,allow the grammar writer more control over the input,Morphologies,eliminate the need to list(multiple)surface forms in the lexicon,eliminate the need for lexical entries for words with predictable subcategorization frames,Talk outline,Basic integrated system,Integrating morphology FSTs,Interaction of tokenization and morphology,Basic Architecture,(Shallow markup),Tokenizing FSTs,Morphology FSTs,LFG grammar and lexicons,Constituent-structure,(tree),Functional-structure,(AVM),Input string,Example steps through the system,Input string:,Boys appeared.,Tokenizing:,boys TB appeared TB.TB,Morphology:,boy+Noun+Pl,appear+Verb+PastBoth+123SP,.+Punct,C-structure/F-structure:next slides,C-structure tree,F-structure AVM,The wider system:XLE,Handwritten grammars for various languages,Substantial for English,German,Japanese,Norwegian,Also:Arabic,Chinese,Urdu,Korean,Welsh,Malagasy,Turkish,Robustness mechanisms,Fragment grammar rules,Morphological guessers,Skimming when resource limits approached,Ambiguity management(packing),Compute all analyses(no“aggressive pruning),Propagate packed ambiguities across processing modules,Stochastic disambiguation,MaxEnt models to select from packed(f-)structures,Other processing available:,generation,semantics,transfer/rewriting,Comparisons to other systems/tasks,Parsing WSJ(Riezler et al,ACL 2002),Comparison to Collins model 3(Riezler et al,NAACL 2004),FST Morphologies,Associate surface form with,a lemma(stem/canonical form),a set of tags,Process is non-deterministic,can have many analyses for one surface form,grammar has to be able to deal with multiple analyses(morphological ambiguity),Issue:can the grammar control rampant morphological ambiguity?,Arabic vowelless representations,Example Morphology Output,turnips,turnip,+Noun+Pl,Mary,Mary,+Prop+Giv+Fem+Sg,falls,fall,+Noun+Pl,fall,+Verb+Pres+3sg,broken,break,+Verb+PastPerf+123SP,broken,+Verb+PastPart +Adj,New York,New York,+Prop+Place+USAState+Prefer,New York,+Prop+Place+City+Prefer,plus analyses of New and York,Morphologies and lexicons,Without a morphology,need to list all surface forms in the lexicon,bad for English,horrible for languages like Finnish and Arabic,With a morphology,one entry for the stem form,go V XLE (V-INTRANS go).,for:,go,goes,going,gone,went,With additional integration,words with predictable subcategorization frames need no entry,Basic idea,Run surface forms of words through the morphology to produce stems and tags,MorphConfig file specifies which morphologies the grammar uses,Look up stems,and tags,in the lexicon,Sublexical phrase structure rules build syntactic nodes covering the stems and tags,Standard grammar rules build larger phrases,Lexical entries for tags,boys=,boy +Noun +Pl,boy,N XLE,(NOUN boy).,+Noun,N_SFX XLE,(PERS 3),(EXISTS NTYPE).,+Pl,NNUM_SFX XLE,(NUM pl).,Sublexical rules for tags,Build up lexical nodes from stem plus tags,Rules are identical to standard phrase structure rules,Except display can hide the sublexical information,N-,N_BASE,N_SFX_BASE,NNUM_SFX_BASE,.,N,N_BASE,boy,N_SFX_BASE,+Noun,NNUM_SFX_BASE,+Pl,Resulting structures,N,N_BASE,boy,N_SFX_BASE,+Noun,NNUM_SFX_BASE,+Pl,PRED boy,PERS 3,NUM pl,NTYPE common,Lexical entries,Stems with unpredictable subcategorization frames need entries,verbs,adjectives with obliques(,proud of her,),nouns with that complements(,the idea that he laughed,),Most lexical items have predictable frames determined by part of speech,common and proper nouns,adjectives,adverbs,numbers,-unknown lexical entry,Match any stem to the entry,Provide desired functional information,%stem,will pass in the appropriate surface form(i.e.,the lemma/stem),Constrain application via morphological tag possibilities,-unknown N XLE(NOUN%stem);,A XLE(ADJ%stem);,ADV XLE(ADVERB%stem).,-unknown example,The box boxes.,Lexicon entries:,box,V,XLE(V-INTRANS%stem).,-unknown,N,XLE(NOUN%stem);,ADV,;,A,.,Morphology output:,box=box,+Noun,+Sg|,+Verb,+Non3Sg,boxes=box,+Noun,+Pl|,+Verb,+3Sg,Build up four effective lexical entries,1 noun,1 verb,1 adverb,1 adjective,adverb and adjective fail sublexically,noun and verb relevant for the sentence,Inflectional morphology summary,Integrating FST morphologies significantly decreases lexicon development,Verbs and other unpredictable items are listed only under their stem form,Predictable items such as nouns are processed via,unknown,and never listed in the lexicon,Guessers,Even large industrial FST morphologies are not complete,Novel words usually have regular morphology,Build and FST guesser based on this,Words with capital letters are proper nouns(,Saakashvili,),Words ending in,ed,are past tense verbs
展开阅读全文