LanguageTranslationPrinciples

上传人:c****d 文档编号:243009492 上传时间:2024-09-13 格式:PPT 页数:40 大小:97.50KB
返回 下载 相关 举报
LanguageTranslationPrinciples_第1页
第1页 / 共40页
LanguageTranslationPrinciples_第2页
第2页 / 共40页
LanguageTranslationPrinciples_第3页
第3页 / 共40页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Language Translation Principles,Part 1: Language Specification,1,Attributes of a language,Syntax: rules describing use of language tokens,Semantics: logical meaning of combinations of tokens,In a programming language, “tokens” include identifiers, keywords, and punctuation,2,Linguistic correctness,A,syntactically correct,program is one in which the tokens are arranged so that the code can be successfully translated into a lower-level language,A,semantically correct,program is one that produces correct results,3,Language translation tools,Parser: scans source code, compares with established syntax rules,Code generator: replaces high level source code with semantically equivalent low level code,4,Techniques to describe syntax of a language,Grammars: specify how you combine atomic elements of language (characters) to form legal strings (words, sentences),Finite State Machines: specify syntax of a language through a series of interconnected diagrams,Regular Expressions: symbolic representation of patterns describing strings; applications include forming search queries as well as language specification,5,Elements of a Language,Alphabet: finite, non-empty set of characters,not precisely the same thing we mean when we speak of natural language alphabet,for example, the alphabet of C+ includes the upper- and lowercase letters of the English alphabet, the digits 0-9, and the following punctuation symbols:,(,),+,-,*,/,%,=, and = form =,& and & form &,1, 2, 3 and 4 form 1234,Concatenation always involves two operands either one can be a string or a single character,8,String characteristics,The number of characters in a string is the strings length,An empty string is a string with length 0; we denote the empty string with the symbol,The is the identity element for concatenation; if x is string, then:,x = x = x,9,Closure of an alphabet,The set of all possible strings that can formed by concatenating elements from an alphabet is the alphabets,closure,denoted,T*,for some alphabet T,The closure of an alphabet includes strings that are not valid tokens in the language; it is not a finite set,For example, if R is the real number alphabet, then R* includes:,-0.092 and 563.18 but also,.0.0.- and 2-4-2.9.-5.,10,Languages & Grammars,A language is a subset of the closure of an alphabet,A grammar specifies how to concatenate symbols from an alphabet to form legal strings in a language,11,Parts of a grammar,N: a nonterminal alphabet; each element of N represents a group of characters from:,T: a terminal alphabet,P: a set of rules for string production; uses nonterminals to describe language structure,S: the start symbol, an element of N,12,Terminal vs. non-terminal symbols,A non-terminal symbol is used to describe or represent a set of terminal symbols,For example, the following standard data types are terminal symols in C+ and Java: int, double, float, char,The non-terminal symbol could be used to represent any or all of these,13,Valid strings,S (the start symbol) is a single symbol, not a set,Given S and P (rules for production), you can decide whether a set of symbols is a valid string in the language,Conversely, starting from S, if you can generate a string of terminal symbols using P, you can create a valid string,14,Productions,A, w,a non-terminal,produces,a string of terminals &,non-terminals,15,Derivations,A grammar specifies a language through the derivation process:,begin with the start symbol,substitute for non-terminals using rules of production until you get a string of terminals,16,Example: a grammar for identifiers (a toy example),N = , , ,T = a, b, c, 1, 2, 3,P = the productions: (, means “produces”), a, b, c, 1, 2, 3,S = ,17,Example: deriving a12bc:, (rule 2), c (rule 6), means c (rule 2),derives in one bc (rule 5),step bc (rule 3), 2bc (rule 8), 2bc (rule 3), 12bc (rule 7), 12bc, a12bc,18,Closure of derivation,The symbol,* means “derives in 0 or more steps”,A language specified by a grammar consists of all strings derivable from the start symbol using the rules of production,provides operational test for membership in the language,if a string cant be derived using production rules, it isnt in the language,19,Example: attempting to derive 2a, , a,Since there is no combination in the production rules, we cant proceed any further,This means that 2a isnt a valid string in our language,20,A grammar for signed integers,N = I, F, M,I means integer,F means first symbol; optional sign,M means magnitude,T = +,-,d (d means digit 0-9),P = the productions:,I, FM,F +,F -,F,(means +/- is optional),M, dM,M d,S = I,21,Examples,Deriving 14:Deriving -7:,I, FMI FM,M, -M, dM, -d, dd, -7, 14,22,Recursive rules,Both of the previous examples (identifiers, integers) have rules in which a nonterminal is defined in terms of itself:, and,M dM,Such rules produce languages with infinite sets of legal sentences,23,Context-sensitive grammar,A grammar in which the production rules may contain more than one non-terminal on the left side,The opposite (all of the examples we have seen thus far), have production rules restricted to a single non-terminal on the left: these are known as context-free grammars,24,Example,N = A,B,C,T = a,b,c,P = the productions:,A, aABC,A abC,CB BC,bB bb,bC bcThis rule is context-sensitive: C can be,substituted with c only if C is immediately,preceded by b,cC cc,S = A,25,Context-sensitive grammar,N = A, B, C,T = a, b, c,P = the productions,A,-,aABC,A,-,abC,CB,-,BC,bB,-,bb,bC,-,bc,cC,-,cc,S = A,Example:,aaabbbcc is a valid string by:,A = aABC (1),= aaABCBC (1),= aaabCBCBC (2),= aaabBCCBC (3),= aaabBCBCC (3),= aaabBBCCC (3),= aaabbBCCC (4),= aaabbbCCC (4),= aaabbbcCC (5),Here, we substituted c for C;,this is allowable only if C has,b in front of it,= aaabbbccC (6),= aaabbbccc (6),26,Valid & invalid strings from previous example:,Valid:,abc,aabbcc,aaabbbccc,aaaabbbbcccc,Invalid:,aabc,cba,bbbccc,The grammar describes a language consisting of strings that start with a number of as, followed by an equal number of bs and cs; this language can be defined mathematically as:,L = a,n,b,n,c,n,| n 0,Note: a,n,means the concatenation of n as,27,A grammar for expressions,N = E, T, F where:,E: expression,T: term T = +, *, (, ), a,F: factor,P: the productions:,E - E + T,E - T,T - T * F,T - F,F - (E),F - a,S = E,28,Applying the grammar,You cant reach a valid conclusion if you dont have a valid string, but the opposite is not true,For example, suppose we want to parse the string (a * a) + a using the grammar we just saw,First attempt:,E= T (by rule 2),= F (by rule 4), and, were stuck, because F can only produce (E) or a; so we reach a dead end, even though the string is valid,29,Applying the grammar,Heres a parse that works for (a*a)+a:,E= E+T (rule 1),= T+T (rule 2),= F+T (rule 4),= (E)+T (rule 5),= (T)+T (rule 2),= (T*F)+T (rule 3),= (T*a)+T (rule 6),= (F*a)+F (rule 4 applied twice),= (a*a) + a (rule 6 applied twice),30,Deriving a valid string from a grammar,Arbitrarily pick a nonterminal on right side of current intermediate string & select rules for substitution until you get a string of terminals,Automatic translators have more difficult problem:,given string of terminals, determine if string is valid, then produce matching object code,only way to determine string validity is to derive it from the start string of the grammar this is called,parsing,31,The parsing problem,Automatic translators arent at liberty to pick rules randomly (as illustrated by the first attempt to translate the preceding expression),Parsing algorithm must search for the right sequence of substitutions to derive a proposed string,Translator must also be able to prove that no derivation exists if proposed string is not valid,32,Syntax tree,A parse routine can be represented as a tree,start symbol is the root,interior nodes are nonterminal symbols,leaf nodes are terminal symbols,children of an interior node are symbols from right side of production rule substituted for parent node in derivation,33,Syntax tree for (a*a)+a,34,Grammar for a programming language,A grammar for a subset of the C+ language is laid out on pages 340-341 of the textbook,A sampling (suitable for either C+ or Java) is given on the next couple of slides,35,Rules for declarations, - ;, - char | int | double,(remember, this is,subset,of actual language), - |, , , - |, |, - a|b|c| |z|A|B|Z, - 0|1|2|3|4|5|6|7|8|9,36,Rules for control structures, -,if () |,if () ,else , -,while () |,do while () ;,37,Rules for expressions, - ;, - ,| = , -, |, |, |, |, = ,etc.,38,Backus-Naur Form (BNF),BNF is the standardized form for specification of a programming language by its rules of production,In BNF, the - operator is written :=,ALGOL-60 first popularized the form,39,BNF described in terms of itself (from Wikipedia),:= | , := := , := | , := | | , := | , := | , := | “, := | ,40,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 课件教案


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!