斯坦福大学自然语言处理14章课件

上传人:痛*** 文档编号:241440709 上传时间:2024-06-26 格式:PPTX 页数:59 大小:1.25MB
返回 下载 相关 举报
斯坦福大学自然语言处理14章课件_第1页
第1页 / 共59页
斯坦福大学自然语言处理14章课件_第2页
第2页 / 共59页
斯坦福大学自然语言处理14章课件_第3页
第3页 / 共59页
点击查看更多>>
资源描述
CFGsandPCFGs(Probabilistic)Context-FreeGrammarsChristopherManningA phrase structure grammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPpeople fish tankspeople fish with rodsNpeopleNfishNtanksNrodsVpeopleVfishVtanksPwithChristopherManningPhrase structure grammars=context-free grammars(CFGs)G=(T,N,S,R)TisasetofterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(SN)Risasetofrules/productionsoftheformXXNand(NT)*AgrammarGgeneratesalanguageL.ChristopherManningPhrase structure grammars in NLPG=(T,C,N,S,L,R)TisasetofterminalsymbolsCisasetofpreterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(SN)Listhelexicon,asetofitemsoftheformXxXPandxTRisthegrammar,asetofitemsoftheformXXNand(NC)*Byusualconvention,Sisthestartsymbol,butinstatisticalNLP,weusuallyhaveanextranodeatthetop(ROOT,TOP)Weusuallywritee foranemptysequence,ratherthannothingChristopherManningA phrase structure grammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPpeople fish tankspeople fish with rodsNpeopleNfish NtanksNrodsVpeopleVfishVtanksPwithChristopherManningProbabilistic or stochastic context-free grammars(PCFGs)G=(T,N,S,R,P)TisasetofterminalsymbolsNisasetofnonterminalsymbolsSisthestartsymbol(SN)Risasetofrules/productionsoftheformXPisaprobabilityfunctionP:R0,1AgrammarGgeneratesalanguagemodelL.ChristopherManningA PCFGSNPVP1.0VPVNP0.6VPVNPPP0.4NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0With empty NP removed so less ambiguousChristopherManningThe probability of trees and stringsP(t)The probability of a tree t is the product of the probabilities of the rules used to generate it.P(s)The probability of the string s is the sum of the probabilities of the trees which have that string as their yield P(s)=j P(s,t)where t is a parse of s =j P(t)ChristopherManningChristopherManningChristopherManningTree and String Probabilitiess =people fish tanks with rodsP(t1)=1.00.70.40.50.60.71.00.21.00.70.1=0.0008232P(t2)=1.00.70.60.50.60.20.71.00.21.00.70.1=0.00024696P(s)=P(t1)+P(t2)=0.0008232+0.00024696=0.00107016Verb attachNoun attachChristopherManningChristopherManningCFGs and PCFGs(Probabilistic)Context-FreeGrammarsGrammarTransformsRestrictingthegrammarformforefficientparsingChristopherManningChomsky Normal FormAllrulesareoftheformXYZorXwX,Y,ZNandwTAtransformationtothisformdoesntchangetheweakgenerativecapacityofaCFGThatis,itrecognizesthesamelanguageButmaybewithdifferenttreesEmptiesandunariesareremovedrecursivelyn-aryrulesaredividedbyintroducingnewnonterminals(n2)ChristopherManningA phrase structure grammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPNpeopleNfish NtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomsky Normal Form stepsSNPVPSVPVPVNPVPVVPVNPPPVPVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfish NtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVSVVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfish NtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfish NtanksNrodsVpeopleSpeopleVfishSfishVtanksStanksPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPNPNPPPNPPPNPNPPPNPPPPNpeopleNfish NtanksNrodsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVNPPPSVNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfish NPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVVP_VVP_VNPPPSVS_VS_VNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfish NPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningA phrase structure grammarSNPVPVPVNPVPVNPPPNPNPNPNPNPPPNPNNPePPPNPNpeopleNfish NtanksNrodsVpeopleVfishVtanksPwithChristopherManningChomsky Normal Form stepsSNPVPVPVNPSVNPVPVVP_VVP_VNPPPSVS_VS_VNPPPVPVPPSVPPNPNPNPNPNPPPNPPNPPPPNPNPpeopleNPfish NPtanksNProdsVpeopleSpeopleVPpeopleVfishSfishVPfishVtanksStanksVPtanksPwithPPwithChristopherManningChomsky Normal FormYoushouldthinkofthisasatransformationforefficientparsingWithsomeextrabook-keepinginsymbolnames,youcanevenreconstructthesametreeswithadetransformInpracticefullChomskyNormalFormisapainReconstructingn-ariesiseasyReconstructingunaries/emptiesistrickierBinarizationiscrucialforcubictimeCFGparsingTherestisntnecessary;itjustmakesthealgorithmscleanerandabitquickerChristopherManningROOTSNPVPNpeopleVNPPPPNProdswithtanksfishNNAn example:before binarizationAn example:before binarizationChristopherManningPNProdsNwithNPNpeopletanksfishNVPVNPPPVP_VROOTSAfter binarizationAfter binarizationChristopherManningTreebank:empties and unariesTreebank:empties and unariesROOTS-HLNNP-SUBJVPVB-NONE-eAtonePTB TreeROOTSNPVPVB-NONE-eAtoneNoFuncTagsROOTSVPVBAtoneNoEmptiesROOTSAtoneNoUnariesROOTVBAtoneHighLowChristopherManningUnary rules:Unary rules:alchemy in the land of treebanksalchemy in the land of treebanksChristopherManningSame-Span ReachabilitySame-Span ReachabilityADJP ADVPFRAG INTJ NPPP PRN QP SSBAR UCP VPWHNPTOPLSTCONJPWHADJPWHADVPWHPPNXNoEmptiesNACSBARQSINVRRCSQXPRTGrammarTransformsRestrictingthegrammarformforefficientparsingCKY ParsingExactpolynomialtimeparsingof(P)CFGsChristopherManningConstituency ParsingfishpeoplefishtanksRule Prob iSNPVP0NPNPNP1Nfish42Npeople43Vfish44PCFGPCFGNNVNVPNPNPSChristopherManningCocke-Kasami-Younger(CKY)Constituency ParsingfishpeoplefishtanksChristopherManningViterbi(Max)ScorespeoplefishNP0.35V0.1N0.5VP0.06NP0.14V0.6N0.2SNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP 1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0ChristopherManningExtended CKY parsingUnariescanbeincorporatedintothealgorithmMessy,butdoesntincreasealgorithmiccomplexityEmptiescanbeincorporatedUsefencepostsDoesntincreasecomplexity;essentiallylikeunariesBinarizationisvitalWithoutbinarization,youdontgetparsingcubicinthelengthofthesentenceandinthenumberofnonterminalsinthegrammarBinarizationmaybeanexplicittransformationorimplicitinhowtheparserworks(Early-styledottedrules),butitsalwaysthere.ChristopherManningfunction CKY(words,grammar)returns most_probable_parse,prob score=new double#(words)+1#(words)+1#(nonterms)back=new Pair#(words)+1#(words)+1#nonterms for i=0;i wordsi in grammar scoreii+1A=P(A-wordsi)/handle unaries boolean added=true while added added=false for A,B in nonterms if scoreii+1B 0&A-B in grammar prob=P(A-B)*scoreii+1B if prob scoreii+1A scoreii+1A=prob backii+1A=B added=trueThe CKY algorithm(1960/1965)extended to unariesChristopherManningfor span=2 to#(words)for begin=0 to#(words)-span end=begin+span for split=begin+1 to end-1 for A,B,C in nonterms prob=scorebeginsplitB*scoresplitendC*P(A-BC)if prob scorebeginendA scorebeginendA=prob backbeginendA=new Triple(split,B,C)/handle unaries boolean added=true while added added=false for A,B in nonterms prob=P(A-B)*scorebeginendB;if prob scorebeginendA scorebeginendA=prob backbeginendA=B added=truereturn buildTree(score,back)The CKY algorithm(1960/1965)extended to unariesCKY ParsingExactpolynomialtimeparsingof(P)CFGsCKY ParsingAworkedexampleChristopherManningThe grammar:Binary,no epsilons,SNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP 1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0score01score12score23score34score02score13score24score03score14score04012341234fishpeoplefishtanks012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0 for i=0;i wordsi in grammar scoreii+1A=P(A-wordsi);Nfish0.2Vfish0.6Npeople0.5Vpeople0.1Nfish0.2Vfish0.6Ntanks0.2Vtanks0.1012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0/handle unariesboolean added=true while added added=false for A,B in nonterms if scoreii+1B 0&A-B in grammar prob=P(A-B)*scoreii+1B if(prob scoreii+1A)scoreii+1A=prob backii+1A=B added=trueNfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0prob=scorebeginsplitB*scoresplitendC*P(A-BC)if(prob scorebeginendA)scorebeginendA=prob backbeginendA=new Triple(split,B,C)Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003NPNPNP0.0049VPVNP0.105SNPVP0.00126NPNPNP0.0049VPVNP0.007SNPVP0.0189NPNPNP0.00196VPVNP0.042SNPVP0.00378012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0/handle unariesboolean added=truewhile added added=false for A,B in nonterms prob=P(A-B)*scorebeginendB;if prob scorebeginendA scorebeginendA=prob backbeginendA=B added=trueNfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003NPNPNP0.0049VPVNP0.105SVP0.0105NPNPNP0.0049VPVNP0.007SNPVP0.0189NPNPNP0.00196VPVNP0.042SVP0.0042012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0for split=begin+1 to end-1 for A,B,C in nonterms prob=scorebeginsplitB*scoresplitendC*P(A-BC)if prob scorebeginendA scorebeginendA=prob backbeginendA=new Triple(split,B,C)Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003NPNPNP0.0049VPVNP0.105SVP0.0105NPNPNP0.0049VPVNP0.007SNPVP0.0189NPNPNP0.00196VPVNP0.042SVP0.0042NPNPNP0.0000686VPVNP0.00147SNPVP0.000882012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0for split=begin+1 to end-1 for A,B,C in nonterms prob=scorebeginsplitB*scoresplitendC*P(A-BC)if prob scorebeginendA scorebeginendA=prob backbeginendA=new Triple(split,B,C)Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003NPNPNP0.0049VPVNP0.105SVP0.0105NPNPNP0.0049VPVNP0.007SNPVP0.0189NPNPNP0.00196VPVNP0.042SVP0.0042NPNPNP0.0000686VPVNP0.00147SNPVP0.000882NPNPNP0.0000686VPVNP0.000098SNPVP0.01323012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0for split=begin+1 to end-1 for A,B,C in nonterms prob=scorebeginsplitB*scoresplitendC*P(A-BC)if prob scorebeginendA scorebeginendA=prob backbeginendA=new Triple(split,B,C)Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Npeople0.5Vpeople0.1NPN0.35VPV0.01SVP0.001Nfish0.2Vfish0.6NPN0.14VPV0.06SVP0.006Ntanks0.2Vtanks0.1NPN0.14VPV0.03SVP0.003NPNPNP0.0049VPVNP0.105SVP0.0105NPNPNP0.0049VPVNP0.007SNPVP0.0189NPNPNP0.00196VPVNP0.042SVP0.0042NPNPNP0.0000686VPVNP0.00147SNPVP0.000882NPNPNP0.0000686VPVNP0.000098SNPVP0.01323NPNPNP0.0000009604VPVNP0.00002058SNPVP0.00018522012341234fishpeoplefishtanksSNPVP0.9SVP0.1VPVNP0.5VPV0.1VPVVP_V0.3VPVPP0.1VP_VNPPP1.0NPNPNP0.1NPNPPP0.2NPN0.7PPPNP1.0Npeople0.5Nfish 0.2Ntanks0.2Nrods0.1Vpeople0.1Vfish0.6Vtanks0.3Pwith1.0CallbuildTree(score,back)togetthebestparseCKY ParsingAworkedexampleConstituency Parser EvaluationChristopherManningEvaluating constituency parsingChristopherManningEvaluating constituency parsingGold standard brackets:S-(0:11),NP-(0:2),VP-(2:9),VP-(3:9),NP-(4:6),PP-(6-9),NP-(7,9),NP-(9:10)Candidate brackets:S-(0:11),NP-(0:2),VP-(2:10),VP-(3:10),NP-(4:6),PP-(6-10),NP-(7,10)LabeledPrecision3/7=42.9%LabeledRecall3/8=37.5%LP/LRF140.0%TaggingAccuracy 11/11=100.0%ChristopherManningHow good are PCFGs?How good are PCFGs?Penn WSJ parsing accuracy:about 73%LP/LR F1Robust Usually admit everything,but with low probabilityPartial solution for grammar ambiguity A PCFG gives some idea of the plausibility of a parseBut not so good because the independence assumptions are too strongGive a probabilistic language model But in the simple case it performs worse than a trigram modelThe problem seems to be that PCFGs lack the lexicalization of a trigram modelConstituency Parser Evaluation
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 管理文书 > 施工组织


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!