
上传人:sx****84 文档编号:243039394 上传时间:2024-09-14 格式:PPT 页数:29 大小:302KB
返回 下载 相关 举报
第1页 / 共29页
第2页 / 共29页
第3页 / 共29页
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,2/8/06,*,Multi-Core Parallelism for Low-Power Design,Vishwani D. Agrawal,James J. Danaher Professor,Department of Electrical and Computer Engineering,Auburn University,Power Consumption of VLSI Chips,Why is it a concern?,SIA Roadmap for Processors (1999),Year,1999,2002,2005,2008,2011,2014,Feature size (nm),180,130,100,70,50,35,Logic transistors/cm,2,6.2M,18M,39M,84M,180M,390M,Clock (GHz),1.25,2.1,3.5,6.0,10.0,16.9,Chip size (mm,2,),340,430,520,620,750,900,Power supply (V),1.8,1.5,1.2,0.9,0.6,0.5,High-perf. Power (W),90,130,160,170,175,183,Source:,ISSCC, Feb. 2001, Keynote,“Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second - about the same number of calculations that the worlds fastest supercomputer can perform now.,“Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. . . .”,Patrick P. Gelsinger,Senior Vice PresidentGeneral Manager,Digital Enterprise Group INTEL CORP.,VLSI Chip Power Density,4004,8008,8080,8085,8086,286,386,486,Pentium,P6,1,10,100,1000,10000,1970,1980,1990,2000,2010,Year,Power Density (W/cm,2,),Hot Plate,Nuclear,Reactor,Rocket,Nozzle,Suns,Surface,Source: Intel,Power Dissipation in CMOS Logic (0.25),%75,%5,%20,P,total,(01) =,C,L,V,DD,2,+,t,sc,V,DD,I,peak,+,V,DD,I,leakage,C,L,V,DD,V,DD,Low-Power Datapath Architecture,Lower supply voltage,This slows down circuit speed,Use parallel computing to gain the speed back,Works well when threshold voltage is also lowered.,About 60% reduction in power obtainable.,Reference: A. P. Chandrakasan and R. W. Brodersen,Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.,A Reference Datapath,Combinational,logic,Output,Input,Register,Register,CK,Supply voltage= V,ref,Total capacitance switched per cycle= C,ref,Clock frequency= f,Power consumption:P,ref,= C,ref,V,ref,2,f,C,ref,A Parallel Architecture,Comb.,Logic,Copy 1,Comb.,Logic,Copy 2,Comb.,Logic,Copy N,Register,Register,Register,Register,N to 1 multiplexer,Multiphase,Clock gen.,and mux,control,Input,Output,CK,f,f/N,f/N,f/N,A copy processes,every Nth input,operates at,reduced voltage,Supply voltage:,V,N, V,1,= V,ref,N = Deg. of,parallelism,Control Signals, N = 4,CK,Phase 1,Phase 2,Phase 3,Phase 4,Power,P,N,=P,proc,+ P,overhead,P,proc,=N(C,inreg,+ C,comb,)V,N,2,f/N + C,outreg,V,N,2,f,=(C,inreg,+ C,comb,+C,outreg,)V,N,2,f,=C,ref,V,N,2,f,P,overhead,=C,overhead,V,N,2,f,C,ref,(N 1)V,N,2,f,P,N,= 1 +,(N 1)C,ref,V,N,2,f,P,N,V,N,2,= 1 +,(N 1) ,P,1,V,ref,2,Voltage vs. Speed,C,L,V,ref,C,L,V,ref,Delay of a gate, T = ,Ik(W/L)(V,ref, V,t,),2,whereI is saturation current,k is a technology parameter,W/L is width to length ratio of transistor,V,t,is threshold voltage,Supply voltage,Normalized,gate delay, T,4.0,3.0,2.0,1.0,0.0,V,t,V,ref,=5V,V,2,=2.9V,N=1,N=2,V,3,N=3,1.2, CMOS,Voltage reduction,slows down as we,get closer to V,t,Increasing Multiprocessing,P,N,/P,1,1 2 3 4 5 6 7 8 9 10 11 12,1.0,0.8,0.6,0.4,0.2,0.0,V,t,=0V (extreme case),V,t,=0.4V,V,t,=0.8V,N,1.2, CMOS, V,ref,= 5V,Extreme Cases: V,t,= 0,Delay, T, 1/ V,ref,For N processing elements, delay = NT V,N,= V,ref,/N,P,N,1,=1+, (N 1) 1/N,P,1,N,2,For negligible overhead,0,P,N,1,P,1,N,2,For V,t, 0, power reduction is less and there will be an,optimum value of N.,Example: Multiplier Core,Specification:,200MHz Clock,15W dissipation 5V,Low voltage operation, V,DD, 1.5 volts,(V,DD, 0.5),2,Relative clock rate = ,20.25,Problem:,Integrate multiplier core on a SOC,Power budget for multiplier 5W,A Multicore Design,Multiplier,Core 1,Multiplier,Core 5,Reg,Reg,Reg,Reg,5 to 1 mux,Multiphase,Clock gen.,and mux,control,Input,Output,200MHz,CK,200MHz,40MHz,40MHz,40MHz,Multiplier,Core 2,Core clock frequency = 200/N, N should divide 200.,How Many Cores?,For N cores:,clock frequency = 200/N MHz,Supply voltage, V,DDN,= 0.5 + (20.25/N),1/2,Volts,Assuming 10% overhead per core,V,DDN,Power dissipation =15 1 + 0.1(N 1),(,),2,watts,5,Design Tradeoffs,Number of cores,N,Clock (MHz),Core supply VDDN (Volts),Total Power,(Watts),1,200,5.00,15.0,2,100,3.68,8.94,4,50,2.75,5.90,5,40,2.51,5.29,8,25,2.10,4.50,Power Reduction in Processors,Just about everything is used.,Hardware methods:,Voltage reduction for dynamic power,Dual-threshold devices for leakage reduction,Clock gating, frequency reduction,Sleep mode,Architecture:,Instruction set,hardware organization,Software methods,Parallel Architecture,Processor,f,Processor,f/2,Processor,f/2,f,Input,Output,Input,Output,Capacitance = C,Voltage = V,Frequency = f,Power = CV,2,f,Capacitance = 2.2C,Voltage = 0.6V,Frequency = 0.5f,Power = 0.396CV,2,f,Pipeline Architecture,Processor,f,Input,Output,Register,Proc.,f,Input,Output,Register,Proc.,Register,Capacitance = C,Voltage = V,Frequency = f,Power = CV,2,f,Capacitance = 1.2C,Voltage = 0.6V,Frequency = f,Power = 0.432CV,2,f,Approximate Trend,G. K. Yeap,Practical Low Power Digital VLSI Design, Boston: Kluwer,Academic Publishers, 1998.,Multicore Processors,200020042008,Performance based on,SPECint2000 and SPECfp2000 benchmarks,Multicore,Single core,Computer, May 2005, p. 12,Multicore Processors,D. Geer, “Chip Makers Turn to Multicore Processors,”,Computer, vol. 38, no. 5, pp. 11-13, May 2005.,A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,”,Computer, vol. 5, no. 7, pp. 36-40, July 2005;,this special issue contains three more articles on multicore processors,.,S. K. Moore, “Winner Multimedia Monster Cells Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.,Cell - Cell Broadband Engine Architecture,L to R,Atsushi Kameyama, Toshiba,James Kahle, IBM,Masakazu Suzoki, Sony, IEEE Spectrum, January 2006,Nine-processor chip:,192 Gflops,Cells Nine-Processor Chip, IEEE Spectrum, January 2006,Eight Identical,Processors,f = 5.6GHz (max),44.8 Gflops,?,Amdahls Law,S P = 1 S,01time,1,Speedup =,S + (1 S)/ N,Where N =number of parallel processors,Example:S = 0.6, N = 10, Speedup = 1.56,S = 0.6, N = , Speedup = 1.67,Gene Amdahl, “Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities,”,AFIPS Conference Proceedings, (30), pp. 483-485, 1967.,Question,Can we find a multi-processing law,for power reduction, or,for performance per watt,


当前位置:首页 > 图纸专区 > 课件教案

copyright@ 2023-2025 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号
