语料的标注与句法结构的提取

上传人:ren****ao 文档编号:245204339 上传时间:2024-10-07 格式:PPT 页数:29 大小:378KB
返回 下载 相关 举报
语料的标注与句法结构的提取_第1页
第1页 / 共29页
语料的标注与句法结构的提取_第2页
第2页 / 共29页
语料的标注与句法结构的提取_第3页
第3页 / 共29页
点击查看更多>>
资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,语料的标注与句法结构的提取,Part I,语料的标注,Part II,句法结构提取,Part I,语料的标注,1.What is annotation?,2.How to do it?,Annotation of corpora,Annotation:,The process of making explicit linguistic categories implicit within a corpus text,for example,by adding layers of information on the grammatical classes of words,or on the classes of speech acts which have taken place in the course of the transcribed speech,or the classes of errors learners made in writing.(Edwards 1995:20).,A.,Part-of-speech tagging,B.Syntactic annotation,C.Semantic annotation,D.Discourse annotation,E.Pragmatic annotation,POS-Tagging,-also known as grammatical tagging,-divides words into categories,based on how they can be combined to form sentences,-most common used form of corpus annotation,Nowadays ,it is fashionable to speak of a generation gap .The parents complain that children are self-centered and do not show them proper respect and obedience ,while children are complaining that parents do not understand them .How does the generation gap form?,How to do it?,manually,computer-assisted,fully automatic,computer-assisted annotation,Annotool,Fully automatic annotation,CLAWS,Constituent Likelihood Automatic Word-tagging System,developed by UCREL(University Centre for Computer Corpus Research on Language)at Lancaster,POS-tagger for English,exists since early 1980,s,has several tagsets,Tagset variation,Category,Example,CLAWS5,Adverb,often,AV0,Adverb,negative,not,XX0,Adverb,comparative,faster,AV0,Adverb,superlative,fastest,AV0,Adverb,particle,up,AVP,Adverb,deictic,here,AV0,Adverb,intensifier,very,AV0,Adv,intensifier,postposed,enough,AV0,Adverb,question,when,AVQ,Adv,question,intensifier,how,AVQ,Fully automatic annotation,Go tagger,When_WRB we_PRP are_VBP born_VBN,_,the_DT education_NN our_PRP$parents_NNS give_VBP us_PRP is_VBZ to_TO learn_VB how_WRB to_TO speak_VB and_CC how_WRB to_TO recognize_VB them_PRP._.It_PRP is_VBZ a_DT basic_JJ education_NN and_CC we_PRP start_VBP to_TO face_VB the_DT colorful_JJ world_NN._.The_DT education_NN is_VBZ very_RB important_JJ which_WDT influences_NNS children_NNS s_POS nature_NN._.According_VBG to_TO that_IN,_,education_NN gives_VBZ the_DT first_JJ step_NN to_TO people_NNS and_CC influences_NNS them_PRP gradually_RB._.,Part II,动词被动结构提取,1.,动词被动结构的概念,2.,动词被动结构提取,动词被动结构的概念,:(passive constructions of verbs),被动结构的种形式:,long passive(with by),short passive(without by),(LGSWE),语料库研究发现,(LGSWE),:,SP are predominant in all syntactic positions in English.,Be-passives sharply differ by register,with conversation and academic prose at the opposite poles.,LP are most common in news and academic prose.,动词被动结构提取,研究问题,:,1,、,中国学生书面语中使用被动结构的情况如何?与英语本族语者有何不同?,2,、,中国学生英语书面语和口语在被动结构上存在何种差异?,3,、中国学生书面语中的被动结构是否随二语水平的提高而发生变化?,回答问题一,:,提取中国学生书面语中被动结构,提取英语本族语者的被动结构,进行对比,.,回答问题三,:,提取中国学生,1-4,年级书面语中的被动结构观察发展趋势,.,练习运用,CONCORD,单独提取某个被动结构:,动词,+,过去分词被动结构:(,V+PP,),例如:,1)be forced(to do),2)Be supported(by),3)Be discussed,结构编码:*,代码的含义,?,代表,be,动词,VB*,代表任何时态的,be,动词,:,代表任何动词的过去分词,如:,表示过去分词,been,第一组:,中国学生作文,本族语书面语,第二组:,中国学生作文,中国学生口语,练习提取:,(,V+PP,)结构统计结果,(,万分率,),:,中国学生,美国人,RF,171,864,StF,67.3,115.1,书面语中,中国学生与美国学生在被动语态使用上差异巨大。,(,V+PP,)结构统计结果:,笔语,(,中国学生,),口语,(,中国学生,),171/67.3,60/26.2,中国学生口语中的被动结构比书面语中要少,被动结构在口笔语中的分布基本合理。,(,V+PP,)结构统计结果:,1,年级,2,年级,3,年级,4,年级,49/14.2,42/10.5,49/14.3,31/9.7,呈现逐年递减的总趋势,但有变异。,(,V+PP,)结构统计结果:,中国学生,外国学生,171/67.3,421/81.9,外国,L2,学生比中国学生高,但低于英语本族语者。,by,例如:,be affected by,*,例如:,be treated as,练习单独提取,“,带,by,的被动结构”,练习批量提取被动结构:,*,*,*,*,*,*,*,*,*,Thank You,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 课件教案


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!