资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,语料的标注与句法结构的提取,Part I,语料的标注,Part II,句法结构提取,Part I,语料的标注,1.What is annotation?,2.How to do it?,Annotation of corpora,Annotation:,The process of making explicit linguistic categories implicit within a corpus text,for example,by adding layers of information on the grammatical classes of words,or on the classes of speech acts which have taken place in the course of the transcribed speech,or the classes of errors learners made in writing.(Edwards 1995:20).,A.,Part-of-speech tagging,B.Syntactic annotation,C.Semantic annotation,D.Discourse annotation,E.Pragmatic annotation,POS-Tagging,-also known as grammatical tagging,-divides words into categories,based on how they can be combined to form sentences,-most common used form of corpus annotation,Nowadays ,it is fashionable to speak of a generation gap .The parents complain that children are self-centered and do not show them proper respect and obedience ,while children are complaining that parents do not understand them .How does the generation gap form?,How to do it?,manually,computer-assisted,fully automatic,computer-assisted annotation,Annotool,Fully automatic annotation,CLAWS,Constituent Likelihood Automatic Word-tagging System,developed by UCREL(University Centre for Computer Corpus Research on Language)at Lancaster,POS-tagger for English,exists since early 1980,s,has several tagsets,Tagset variation,Category,Example,CLAWS5,Adverb,often,AV0,Adverb,negative,not,XX0,Adverb,comparative,faster,AV0,Adverb,superlative,fastest,AV0,Adverb,particle,up,AVP,Adverb,deictic,here,AV0,Adverb,intensifier,very,AV0,Adv,intensifier,postposed,enough,AV0,Adverb,question,when,AVQ,Adv,question,intensifier,how,AVQ,Fully automatic annotation,Go tagger,When_WRB we_PRP are_VBP born_VBN,_,the_DT education_NN our_PRP$parents_NNS give_VBP us_PRP is_VBZ to_TO learn_VB how_WRB to_TO speak_VB and_CC how_WRB to_TO recognize_VB them_PRP._.It_PRP is_VBZ a_DT basic_JJ education_NN and_CC we_PRP start_VBP to_TO face_VB the_DT colorful_JJ world_NN._.The_DT education_NN is_VBZ very_RB important_JJ which_WDT influences_NNS children_NNS s_POS nature_NN._.According_VBG to_TO that_IN,_,education_NN gives_VBZ the_DT first_JJ step_NN to_TO people_NNS and_CC influences_NNS them_PRP gradually_RB._.,Part II,动词被动结构提取,1.,动词被动结构的概念,2.,动词被动结构提取,动词被动结构的概念,:(passive constructions of verbs),被动结构的种形式:,long passive(with by),short passive(without by),(LGSWE),语料库研究发现,(LGSWE),:,SP are predominant in all syntactic positions in English.,Be-passives sharply differ by register,with conversation and academic prose at the opposite poles.,LP are most common in news and academic prose.,动词被动结构提取,研究问题,:,1,、,中国学生书面语中使用被动结构的情况如何?与英语本族语者有何不同?,2,、,中国学生英语书面语和口语在被动结构上存在何种差异?,3,、中国学生书面语中的被动结构是否随二语水平的提高而发生变化?,回答问题一,:,提取中国学生书面语中被动结构,提取英语本族语者的被动结构,进行对比,.,回答问题三,:,提取中国学生,1-4,年级书面语中的被动结构观察发展趋势,.,练习运用,CONCORD,单独提取某个被动结构:,动词,+,过去分词被动结构:(,V+PP,),例如:,1)be forced(to do),2)Be supported(by),3)Be discussed,结构编码:*,代码的含义,?,代表,be,动词,VB*,代表任何时态的,be,动词,:,代表任何动词的过去分词,如:,表示过去分词,been,第一组:,中国学生作文,本族语书面语,第二组:,中国学生作文,中国学生口语,练习提取:,(,V+PP,)结构统计结果,(,万分率,),:,中国学生,美国人,RF,171,864,StF,67.3,115.1,书面语中,中国学生与美国学生在被动语态使用上差异巨大。,(,V+PP,)结构统计结果:,笔语,(,中国学生,),口语,(,中国学生,),171/67.3,60/26.2,中国学生口语中的被动结构比书面语中要少,被动结构在口笔语中的分布基本合理。,(,V+PP,)结构统计结果:,1,年级,2,年级,3,年级,4,年级,49/14.2,42/10.5,49/14.3,31/9.7,呈现逐年递减的总趋势,但有变异。,(,V+PP,)结构统计结果:,中国学生,外国学生,171/67.3,421/81.9,外国,L2,学生比中国学生高,但低于英语本族语者。,by,例如:,be affected by,*,例如:,be treated as,练习单独提取,“,带,by,的被动结构”,练习批量提取被动结构:,*,*,*,*,*,*,*,*,*,Thank You,
展开阅读全文