浮点运算单元PPT演示课件

资源描述

1浮点运算单元2浮点运算nFloating-Point NumbersnIEEE 754 Floating-Point StandardnFloating-Point Addition and SubtractionnFloating-Point Multiplication3浮点数在计算机内的格式浮点数在计算机内的格式浮点数浮点数: X = MS ES Em-1 .E2 E1 M-1M-2.M-n 符号位符号位阶码位阶码位尾数数码位尾数数码位总位数总位数短浮点数短浮点数: 1 8 23 32长浮点数长浮点数: 1 11 52 64 临时浮点数临时浮点数: 1 15 64 80IEEE 标准：标准：阶码用移码，基为阶码用移码，基为2；尾数用原码尾数用原码X = MX * 2EX浮点数的浮点数的阶码阶码的位数决定数的表示范围，的位数决定数的表示范围，尾数尾数的位数决定数的有效精度。的位数决定数的有效精度。4浮点数在计算机内的格式浮点数在计算机内的格式浮点数浮点数: X = M E E .E E M M .M ssm-110-1-2-nIEEE 标准：标准：尾数用原码尾数用原码X = MX * 2EX 浮点数是数学中实数的子集合，由一个纯小数乘上一个指数浮点数是数学中实数的子集合，由一个纯小数乘上一个指数值来组成。在计算机内，其纯小数部分被称为浮点数的值来组成。在计算机内，其纯小数部分被称为浮点数的尾数尾数，对非对非 0 值的浮点数，要求尾数的绝对值值的浮点数，要求尾数的绝对值必须必须 = 1/2，称满足这，称满足这种表示要求的浮点数为种表示要求的浮点数为规格化表示规格化表示；把不满足这一表示要求的尾数，变成满足这一要求的尾数把不满足这一表示要求的尾数，变成满足这一要求的尾数的操作过程，叫作浮点数的的操作过程，叫作浮点数的规格化处理规格化处理，通过尾数移位和修改，通过尾数移位和修改阶码实现。阶码实现。5浮点数在计算机内的格式浮点数在计算机内的格式浮点数浮点数: X = M E E .E E M M .M ssm-110-1-2-nIEEE 标准：标准：尾数用原码尾数用原码X = MX * 2EX 按国际电子电气工程师协会规定的标准，浮点数的尾数要按国际电子电气工程师协会规定的标准，浮点数的尾数要用原码表示，即符号位用原码表示，即符号位 Ms: 0 表示正，表示正，1 表示负，且非表示负，且非 0 值尾数值尾数数值的最高位数值的最高位 M-1 必为必为 1, 才能满足浮点数规格化表示的要求；才能满足浮点数规格化表示的要求；既然非既然非 0 值浮点数的尾数数值最高位必定为值浮点数的尾数数值最高位必定为 1，则在保存，则在保存浮点数到内存前，通过尾数右移浮点数到内存前，通过尾数右移, 强行把该位去掉强行把该位去掉, 用同样多的用同样多的尾数位就能多存一位二进制数，有利于提高数据表示精度，称尾数位就能多存一位二进制数，有利于提高数据表示精度，称这种处理方案使用了这种处理方案使用了隐藏位隐藏位技术。技术。当然，在取回这样的浮点数到运算器执行运算时，必须先当然，在取回这样的浮点数到运算器执行运算时，必须先恢复该隐藏位。恢复该隐藏位。6Floating Point7浮点数在计算机内的格式浮点数在计算机内的格式X = Ms Es Em-1 .E1 E0 M-1 M-2 .M-n IEEE 标准：标准：阶码用移码，基为阶码用移码，基为2X = MX * 2EX 按国际电子电气工程师协会规定的国际通用标准，浮点按国际电子电气工程师协会规定的国际通用标准，浮点数的阶码用整数给出，并且要用移码表示，用作为以数的阶码用整数给出，并且要用移码表示，用作为以 2为底为底的指数的幂。既然该指数的底一定为的指数的幂。既然该指数的底一定为 2 ，可以不必在浮点数，可以不必在浮点数的格式中明确表示出来，的格式中明确表示出来，只需给出阶码的幂值即可。只需给出阶码的幂值即可。移码表示移码表示只用于只用于表示整数，表示整数，只用在只用在浮点数的阶码部分浮点数的阶码部分，其定义类似于整数的补码定义，差别在符号位。其定义类似于整数的补码定义，差别在符号位。移码的符号位移码的符号位是是 0 表示负，表示负，1 表示正，与补码的符号位表示正，与补码的符号位正好相反，移码是指机器数在数轴上有个移位关系；正好相反，移码是指机器数在数轴上有个移位关系；移码的数值位移码的数值位则与补码的数值位完全相同。则与补码的数值位完全相同。8浮点数格式：关于浮点数格式：关于移码移码的知识的知识浮点数浮点数: X = M E E .E E M M .M ssm-110-1-2-nX = MX * 2EX移码表示整数，用在浮点数的阶码部分。移码表示整数，用在浮点数的阶码部分。一位符号位和一位符号位和 n 位数值位组成的移码位数值位组成的移码, 其定义为；其定义为；E移移 = 2n + E -2n=E2n 表示范围：表示范围： 00000000 111111110负数负数正数正数机器数机器数X补补 =X 0 X 2n 2n+1 + X -2n X 09浮点数格式：关于浮点数格式：关于移码移码的知识的知识一位符号位和一位符号位和 n 位数值位组成的移码位数值位组成的移码, 其定义为；其定义为；E移移 = 2n + E -2n=E2n 表示范围：表示范围： 00000000 11111111 负数负数正数正数机器数机器数0 移码只执行二数的加减运算与增移码只执行二数的加减运算与增 1、减减 1 操作。加减运算操作。加减运算时，符号位计算结果求反后时，符号位计算结果求反后, 才是加减运算的正确符号位的值。才是加减运算的正确符号位的值。注意注意:当用双符号位时，当用双符号位时，00代表负，代表负，01代表正，而不是代表正，而不是11代表正代表正 8 位的阶码能表示位的阶码能表示-128+127，当阶码为，当阶码为-128时，其补码表时，其补码表示为示为 00000000，该浮点数的绝对值，该浮点数的绝对值2-128,人们规定此浮点数的人们规定此浮点数的值为零，若尾数不为值为零，若尾数不为 0 就清其为就清其为 0，并特称此值为，并特称此值为机器零。机器零。8 位移码表示的机器数为数的真值位移码表示的机器数为数的真值在数轴上在数轴上向右平移向右平移了了 128 个位置个位置-128+12710Biased Exponent nValue of exponent = val(E) = E Bias (Bias is a constant)n8 bits for single precisionn E can be in the range 0 to 255n E = 0 and E = 255 are reserved for special usen E = 1 to 254 are used for normalized floating point numbersn Bias = 127 (half of 254), val(E) = E 127 val(E=1) = 126, val(E=127) = 0, val(E=254) = 12711Example of ExponentExponent (E)Adjusted Binary (E + 127) +51321000010001271111111-101171110101+12825511111111-12700-1126111111012Example of Normalized Mantissa Binary ValueNormalized AsExponent1101.1011.10110130.001011.01-31.00011.00010100000111.0000011713Biased Exponent 14Example of Floating Point15Largest Normalized Float16Smallest Normalized Float17Zero Infinity NaN18Denormalized numbers19Zero & Infinity20nThe value NaN (Not a Number) is used to represent a value that does not represent a real number.nNaN is a special value represented with maximum E and F 0nResult from exceptional situations, such as 0/0 or sqrt(negative)nOperation on a NaN results is NaN: Op(X, NaN) = NaNnQNaN denote indeterminate operations, nSNaN denote invalid operations NaN21SignSignExponent (Exponent (e e) )Fraction Fraction ( (f f) )ValueValue000.0000.00+0000.0000.01Positive Denormalized Real:0.f 2(-b+1)11.11000.01XX.XXPositive Normalized Real:1.f 2(e-b)11.10011.1100.00+Infinity011.1100.01SNaN:01.11011.1110.00QNaN:11.1122SignSignExponent (Exponent (e e) )Fraction Fraction ( (f f) )ValueValue100.0000.00-0100.0000.01Negative Denormalized Real:-0.f 2(-b+1)11.11100.01XX.XXNegative Normalized Real:-1.f 2(e-b)11.10111.1100.00-Infinity111.1100.01SNaN:01.11111.1110.00QNaN:11.1123OperationOperationResultResultn Infinity0Infinity InfinityInfinitynonzero 0InfinityInfinity + InfinityInfinity0 0NaNInfinity - InfinityNaNInfinity InfinityNaNInfinity 0NaN24FP Add25FP Add26Floating Point Subtraction Example27Floating Point Subtraction Example28Extra bits29Guard bit30Extra bit31Rounding Modenearest nIn this mode, the inexact results are rounded to the nearer of the two possible result values. If the neither possibility is nearer, then the even alternative is chosen. This form of rounding is also called round to even。 “Even” when least significant bit is 0nValueBinary RoundedAction Rounded Valuen2 3/3210.00011210.002 (1/2up) 2 1/4n2 7/810.11100211.002 (1/2up) 3n2 5/810.10100210.102 (1/2down) 2 1/23233Rounding Mode34Steps in Addition/Subtraction of Floating-Point NumbersnStep 1: Calculate difference d of the two exponents - d=|E1 - E2|nStep 2: Shift significand of smaller number by d-base positions to the rightnStep 3: Add aligned significands and set exponent of result to exponent of larger operandnStep 4: Normalize resultant significand and adjust exponent if necessarynStep 5: Round resultant significand and adjust exponent if necessary35Addition/Subtraction Structure36Addition/Subtraction nE1E2 - Exponent of larger number not decreased - this will result in a larger significand adder required.u Addition - resultant significand M (sum of two aligned significands) is in range 1/ M 1 - a postnormalization step - shifting significand to the right to yield M3 and increasing exponent by one - is required (an exponent overflow may occur)37Addition/Subtraction NormalizationnSubtraction - Resultant significand M is in range 0 |M|1 - postnormalization step - shifting significand to left and decreasing exponent - is required if M1) - only a pre-alignment shift may be needed41CLOSE CasenExponent difference predicted based on two least significant bits of operands - allows subtraction of significands to start as soon as possibleqIf 0 - subtract executed with no alignmentqIf 1 - significand of smaller operand is shifted once to the right (using a multiplexor) and then subtracted from other significand nIn parallel - true exponent difference calculated qIf 1 - procedure aborted and FAR procedure followedqIf 1 - CLOSE procedure continuednIn parallel with subtraction - number of leading zeros predicted to determine number of shift positions in postnormalization42 CLOSE Case - Normalization and RoundingnNext - normalization of significand and corresponding exponent adjustment nLast - rounding - precomputing sum, sum+1 - selecting the one which is properly rounded - negation of result may be necessary nResult of subtraction usually positive - negation not requirednOnly when exponents equal - result of significand subtraction may be negative (in twos complement) - requiring a negation stepnNegation and rounding steps - mutually exclusive43FAR CasenFirst - exponent difference calculated nNext - significand of smaller operand shifted to right for alignment nShifted-out bits used to set sticky bitnSmaller significand subtracted from larger -result either normalized.nLast step - rounding 44Leading Zeros Prediction CircuitnPredict position of leading non-zero bit in result of subtract before subtraction is completed nAllowing to execute postnormalization shift immediately following subtractionnExamine bits of operands (of subtract) in a serial fashion, starting with most significant bits to determine position of first 1 nThis serial operation can be accelerated using a parallel scheme similar to carry-look-ahead45Leading Zeros Prediction CircuitnPredict position of leading non-zero bit in result of subtract before subtraction is completed nAllowing to execute postnormalization shift immediately following subtractionnExamine bits of operands (of subtract) in a serial fashion, starting with most significant bits to determine position of first 1 nThis serial operation can be accelerated using a parallel scheme similar to carry-look-ahead46Alternative Prediction of Leading 1nGenerate in parallel intermediate bits ei - ei=1 ifqai = bi and qai-1 and bi-1 allow propagation of expected carry (at least one is 1) qSubtract executed by forming ones complement of subtrahend and forcing carry into least significant position - carry expected47nei = (ai bi) (ai-1 + bi-1) - ei=1 if carry allowed to propagate to position i qIf forced carry propagates to position i - i-th bit of correct result will also be 1qIf not - correct result will have a 1 in position i-1 insteadqPosition of leading 1 - either same as ei or one to the rightnCount number of leading zeros in ei - provide count to barrel shifter for postnormalization - at most one bit correction shift (left) needed

展开阅读全文

浮点运算单元PPT演示课件

最新文档