ssd6复习PPT_lecture7_1

资源描述

Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,*,Lecture 7,(1/2),Memory Operation & Performance,640k ought to be enough for anybody., Bill Gates, 1981,The Contents in SSD6 cover:,5.1Memory Systems,5.2Caches,5.3Virtual Memory (VM),Exercise 5: Cache Lab,Lecture 7,Memory Operation and Performance,Lecture 7,Why Memory Hierarchy ?,So far, we rely on a simple model of a computer system as a CPU that executes instructions and a memory system that holds instructions and data for the CPU.,In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access times.,Unwary programmers who assume a flat, uniform memory risk significant performance slowdowns in their programs, while wise programmers who understand the hierarchical nature of memory can produce efficient programs with fast average memory access times.,Lecture 7,Memory Technology,There are many ways to store a bit of information.,Current technologies use:,semiconductors (memory proper),magnetic plates (hard disks),and reflective puckered surfaces (CDs),Computers designed in England in the 1940s, used,mercury delay lines,to store bits.,Lecture 7,Memory Technology,Static Random Access Memory (SRAM),Dynamic Random Access Memory (DRAM),Magnetic disks,Magnetic tapes,Optical disks,Lecture 7,Memory Technology,Characteristics of DRAM & SRAM memory,Lecture 7,Memory Technology,Characteristics of DISKs memories,Disks are workhorse storage devices that hold enormous amounts of data, on the order of tens to hundreds of gigabytes, as opposed to the hundreds or thousands of megabytes in a RAM-based memory.,However, it takes on the order of milliseconds to read information from a disk, a hundred thousand times longer than from DRAM and a million times longer than from SRAM,Lecture 7,Memory Technology,Trade-Offs,Lecture 7,Memory Technology,This scheme of taking advantage of several memory technologies to get both speed and size is called a,memory hierarchy,Lecture 7,Memory Technology,This scheme of taking advantage of several memory technologies to get both speed and size is called a,memory hierarchy,Lecture 7,Locality of Reference,Ideal situation:,addresses of memory to be accessed by the CPU during execution would be known ahead of time,prefetch,data,Actual situation:,Impossible to know exactly which addresses will be accessed by the CPU,Compiler cannot by itself decide which data to move from slow to fast memory,To GUESS!,Lecture 7,Locality of Reference,the immediate future will be similar to the immediate past and that memory addresses that have been accessed recently are likely to be accessed again.,locality of reference,To GUESS!,Lecture 7,Locality of Reference,Spatial and Temporal Locality,References to a single address occur close together in time (this is called,temporal,locality).,References to addresses that are near to each other occur together in time (this is called,spatial,locality).,Locality in a Code Fragment,Any Theory?,Lecture 7,Locality of Reference,It is just that the typical program does,The local variables of your procedures are all accessed when the procedures execute,variables tend to be accessed multiple times within their useful lifetime,Locality of reference is in a statistic sense, rather in a “always” sense.,Lecture 7,Locality of Reference,It is a property of programs, not of computers,Programs locality is a property of the programs behavior, and not of the computer,Lecture 7,Locality of Reference,Locality in a Code Fragment,A function with good locality,N=8,Lecture 7,Locality of Reference,Locality in a Code Fragment,Another function with good locality,a23,Lecture 7,Locality of Reference,Locality in a Code Fragment,A function with poor spatial locality.,Lecture 7,Locality of Reference,sum = 0;,for (i = 0; i MAX; i+),sum += arrayi;,Locality in a Code Fragment,Lecture 7,Locality of Reference,sum = 0;,for (i = 0; i MAX; i+),sum += arrayi;,Locality in a Code Fragment,Lecture 7,Locality of Reference,sum = 0;,for (i = 0; i MAX; i+),sum += arrayi;,Locality in a Code Fragment,Lecture 7,Locality of Reference,Locality in a Code Fragment,The fetch of FRAG + 4 in line 5 predicts the fetch in line 10,temporal locality.,The fetch of FRAG predicts the fetches of all addresses between FRAG and FRAG + 6.,spatial locality.,The references to STACKPTR + X predict that STACKPTR + X + 4,spatial locality.,The references to both STACKPTR + X and STACKPTR + X + 4 predict further repeated references to each,temporal locality.,The reference to ARRAY predicts later reference to ARRAY + 1,spatial locality.,The principle is,statistically,true!,Lecture 7,Memory Hierarchies,memory hierarchy,Computers use memory technologies of varying size and speed to achieve both acceptable speed and size at an affordable cost,hit ratio,The percentage of accesses for which the prediction is successful,Typical hit ratios in todays computers are almost always above 0.90,Look at some numerical evidences in our doc.,Lecture 7,Memory Hierarchies,An Intel Pentium III 600 MHz,processor can execute hundreds of millions of instructions per second,Condition: Each of those instructions must be fetched from memory,capacity to generate at least 200 million memory references per second,reference cannot take, on the average, more than about 1/200,000,000 of a second, or 5 nanoseconds,However, DRAM with an access time of,60,nanoseconds or more,Lecture 7,Memory Hierarchies,What if we had no memory hierarchy to access DRAM (that is, if the hit ratio were 0.0)?,Access time to DRAM is 60 nanoseconds,Complete 1/0.00000006 memory references in one second, or fewer than 17 million,9 million instructions per second,slower than a 50 MHz processor does,A 600MHz processor operating at the speed of a 50 MHz, if a cache hit ratio is 0.0!,Lecture 7,Memory Hierarchies,The memory bottleneck is increasingly the obstacle to high computer performance,0.50 * N * 0.00000006 + 0.50 * N * 0.00000006 / 15,Lecture 7,Memory Hierarchies,Typical computer designs have at least four levels in the hierarchy,Registers are closest to the CPU, are smallest in size, and are also the fastest memory in the system,Furthest from the CPU is the disk, where large inactive parts of programs sleep for long periods of time,In between: caches & main memory,The policy decisions for these levels are made as the program executes,The computer has special cache hardware that moves data from memory to caches and vice versa.,Design of the Hierarchy,Lecture 7,Memory Hierarchies,Design of the Hierarchy,Lecture 7,Memory Hierarchies,Fetch: When should data be moved up the hierarchy? How much data should be moved in each transaction?,Placement: Where in the smaller memory should the fetched data be stored? How can it be found when requested by the CPU?,Replacement: When newly fetched data finds its location full, which previously resident data should be evicted?,Update: When the cached data is changed, should the corresponding data in a lower level of memory be updated immediately or later?,Design of the Hierarchy : Strategies,Lecture 7,Memory Hierarchies,Lecture 7,Caches,Caches: small, fast storage devices that act as staging areas for the data objects stored in larger, slower devices,Each level in the hierarchy caches data objects from the next lower level,Miss: any data requested by the CPU that is not in the cache must be immediately fetched.,Cache lines: the chunks in which all transfers to and from the cache are done,Cache lines are typically in the order of 16 or 32 bytes long,Special terms:,Lecture 7,Caches,Caches are used everywhere in modern systems.,Caches are used in CPU chips, operating systems, distributed file systems, and on the World-Wide Web.,They are built from and managed by various combinations of hardware and software.,Lecture 7,Caches,Special terms:,Lecture 7,Caches,Cache Hits:,Cache Misses,Cold misses, or compulsory misses,Conflict misses,Capacity misses,Special terms:,Lecture 7,Cache Memories,Typical bus structure for L1 & L2 caches,Lecture 7,Cache Memories,Generic Cache Memory Organization,Lecture 7,Cache Memories,An Example,Fundamental parameters,Parameters,Description,Number of Sets,E,Number of lines per set,Block size (bytes),Number of physical address bits,Lecture 7,Cache Memories,An Example,22 8 22,24 5 3,1 27 0 5,Lecture 7,Cache Memories,Direct-Mapped Caches,Lecture 7,Cache Memories,Set selection in a direct-mapped cache,Lecture 7,Cache Memories,Word selection in Direct-mapped Caches,Lecture 7,Cache Memories,An Example,(S,E,B,m)=(4,1,2,4),Lecture 7,Cache Memories,An Example,(S,E,B,m)=(4,1,2,4),Block 0 consists of address 0&1, Block 1 consists of address 2&3, ,Blocks 0 & 4 both map to set 0, Blocks 1 & 5 both map to set 1, ,Blocks that map to the same cache set are uniquely identified by the tag: block 0 has a tag bit of 0, while block 4 has a tag bit of 1, ,Lecture 7,Cache Memories,An Example,Initial status:,* Empty *,Lecture 7,Cache Memories,An Example,2. Read word at address 0:,* Cache Miss *,Lecture 7,Cache Memories,An Example,3. Read word at address 1:,* Cache Hit *,Lecture 7,Cache Memories,An Example,4. Read word at address 13:,* Cache Miss *,Lecture 7,Cache Memories,An Example,5. Read word at address 8:,* Cache Miss *,Lecture 7,Cache Memories,An Example,6. Read word at address 0:,* Cache Miss *,Lecture 7,Cache Memories,Conflict Misses in Direct-Mapped Caches,A fragment Code,float dotprod,( float x8, float y8),float sum=0.0;,int i;,for ( i=0; i8; i+),sum += xi + yi;,return sum;,float dotprod,( float x8, float y8),float sum=0.0;,int i;,for ( i=0; i8; i+),sum += xi + yi;,return sum;,Lecture 7,Cache Memories,Conflict Misses in Direct-Mapped Caches,Thrashing!,How to solve?,float dotprod,( float x8, float y8),float sum=0.0;,int i;,for ( i=0; i8; i+),sum += xi + yi;,return sum;,Lecture 7,Cache Memories,Conflict Misses in Direct-Mapped Caches,Define x12 rather than x8,Conclusions:,Even through the program has good spatial locality, and we have room in the cache to hold the blocks for both xi and yi, each reference results in a conflict miss because the blocks map to same cache set.,Lecture 7,Cache Memories,Conflict Misses in Direct-Mapped Caches,Lecture 7,Cache Memories,Different Caches,Classification,Cost,Hit Rate,Miss Rate,Direct mapped,$5,88%,12%,Fully Associative,$500000,99.8%,0.02%,S-way set Associative,$20$50,9899%,12%,

展开阅读全文

ssd6复习PPT_lecture7_1

最新文档