计算机组织与结构(英文版)课后答案

资源描述

Solutions ManualCOMPUTER ORGANIZATION AND ARCHITECTUREDesigning for PerformanceSeventh EditionWilliam StallingsCopyright 2005: William Stallings 2005 by William StallingsAll rights reserved. No part of this document may be reproduced, in any form or by any means, or posted on the Internet, without permission in writing from the author.NoticeThis manual contains solutions to all of the review questions and homework problems in Computer Organization and Architecture, Seventh Edition. If you spot an error in a solution or in the wording of a problem, I would greatly appreciate it if you would forward the information via email to ws. An errata sheet for this manual, if needed, is available at WilliamSW.S.TABLE OF CONTENTSChapter 2:Computer Evolution and Performance5Chapter 3:Computer Function and Interconnection9Chapter 4:Cache Memory14Chapter 5:Internal Memory27Chapter 6:External Memory33Chapter 7:Input/Output37Chapter 8:Operating System Support43Chapter 9:Computer Arithmetic48Chapter 10:Instruction Sets: Characteristics and Functions61Chapter 11:Instruction Sets: Addressing Modes and Formats72Chapter 12:Processor Structure and Function77Chapter 13:Reduced Instruction Set Computers (RISCs)83Chapter 14:Instruction-Level Parallelism and Superscalar Processors87Chapter 15:The IA-64 Architecture93Chapter 16:Control Unit Operation97Chapter 17:Microprogrammed Control100Chapter 18:Parallel Processing103Appendix A:Number Systems112Appendix B:Digital Logic113Chapter 2Computer Evolution and PerformanceAnswers to Questions2.1In a stored program computer, programs are represented in a form suitable for storing in memory alongside the data. The computer gets its instructions by reading them from memory, and a program can be set or altered by setting the values of a portion of memory.2.2A main memory, which stores both data and instructions: an arithmetic and logic unit (ALU) capable of operating on binary data; a control unit, which interprets the instructions in memory and causes them to be executed; and input and output (I/O) equipment operated by the control unit.2.3Gates, memory cells, and interconnections among gates and memory cells.2.4Moore observed that the number of transistors that could be put on a single chip was doubling every year and correctly predicted that this pace would continue into the near future.2.5Similar or identical instruction set: In many cases, the same set of machine instructions is supported on all members of the family. Thus, a program that executes on one machine will also execute on any other. Similar or identical operating system: The same basic operating system is available for all family members. Increasing speed: The rate of instruction execution increases in going from lower to higher family members. Increasing Number of I/O ports: In going from lower to higher family members. Increasing memory size: In going from lower to higher family members. Increasing cost: In going from lower to higher family members.2.6In a microprocessor, all of the components of the CPU are on a single chip.Answers to Problems2.1This program is developed in HAYE98. The vectors A, B, and C are each stored in 1,000 contiguous locations in memory, beginning at locations 1001, 2001, and 3001, respectively. The program begins with the left half of location 3. A counting variable N is set to 999 and decremented after each step until it reaches 1. Thus, the vectors are processed from high location to low location.LocationInstructionComments0999Constant (count N)11Constant21000Constant3LLOAD M(2000)Transfer A(I) to AC3RADD M(3000)Compute A(I) + B(I)4LSTOR M(4000)Transfer sum to C(I)4RLOAD M(0)Load count N5LSUB M(1)Decrement N by 15RJUMP+ M(6, 20:39)Test N and branch to 6R if nonnegative6LJUMP M(6, 0:19)Halt6RSTOR M(0)Update N7LADD M(1)Increment AC by 17RADD M(2)8LSTOR M(3, 8:19)Modify address in 3L8RADD M(2)9LSTOR M(3, 28:39)Modify address in 3R9RADD M(2)10LSTOR M(4, 8:19)Modify address in 4L10RJUMP M(3, 0:19)Branch to 3L2.2a.OpcodeOperand00000001000000000010b.First, the CPU must make access memory to fetch the instruction. The instruction contains the address of the data we want to load. During the execute phase accesses memory to load the data value located at that address for a total of two trips to memory.2.3To read a value from memory, the CPU puts the address of the value it wants into the MAR. The CPU then asserts the Read control line to memory and places the address on the address bus. Memory places the contents of the memory location passed on the data bus. This data is then transferred to the MBR. To write a value to memory, the CPU puts the address of the value it wants to write into the MAR. The CPU also places the data it wants to write into the MBR. The CPU then asserts the Write control line to memory and places the address on the address bus and the data on the data bus. Memory transfers the data on the data bus into the corresponding memory location.2.4AddressContents08A08B08C08DLOAD M(0FA)STOR M(0FB)LOAD M(0FA)JUMP +M(08D)LOAD M(0FA)STOR M(0FB)This program will store the absolute value of content at memory location 0FA into memory location 0FB.2.5All data paths to/from MBR are 40 bits. All data paths to/from MAR are 12 bits. Paths to/from AC are 40 bits. Paths to/from MQ are 40 bits.2.6 The purpose is to increase performance. When an address is presented to a memory module, there is some time delay before the read or write operation can be performed. While this is happening, an address can be presented to the other module. For a series of requests for successive words, the maximum rate is doubled. 2.7The discrepancy can be explained by noting that other system components aside from clock speed make a big difference in overall system speed. In particular, memory systems and advances in I/O processing contribute to the performance ratio. A system is only as fast as its slowest link. In recent years, the bottlenecks have been the performance of memory modules and bus speed.2.8As noted in the answer to Problem 2.7, even though the Intel machine may have a faster clock speed (2.4 GHz vs. 1.2 GHz), that does not necessarily mean the system will perform faster. Different systems are not comparable on clock speed. Other factors such as the system components (memory, buses, architecture) and the instruction sets must also be taken into account. A more accurate measure is to run both systems on a benchmark. Benchmark programs exist for certain tasks, such as running office applications, performing floating point operations, graphics operations, and so on. The systems can be compared to each other on how long they take to complete these tasks. According to Apple Computer, the G4 is comparable or better than a higher-clock speed Pentium on many benchmarks.2.9This representation is wasteful because to represent a single decimal digit from 0 through 9 we need to have ten tubes. If we could have an arbitrary number of these tubes ON at the same time, then those same tubes could be treated as binary bits. With ten bits, we can represent 210 patterns, or 1024 patterns. For integers, these patterns could be used to represent the numbers from 0 through 1023.2.10IcpmktInstruction set architectureXXCompiler technologyXXXProcessor implementationXXCache and memory hierarchyXXSource: HWAN932.11MIPS rate = f/(CPI 106)2.12a.We can express the MIPs rate as: (MIPS rate)/106 = Ic/T. So that:Ic = T (MIPS rate)/106. The ratio of the instruction count of the RS/6000 to the VAX is x 18/12x 1 = 1.5.b.For the Vax, CPI = (5 MHz)/(1 MIPS) = 5.For the RS/6000, CPI = 25/18 = 1.39.2.13CPI = 1.55; MIPS rate = 25.8; Execution time = 3.87 ns. Source: HWAN932.14a.Ultimately, the user is concerned with the execution time of a system, not its execution rate. If we take arithmetic mean of the MIPS rates of various benchmark programs, we get a result that is proportional to the sum of the inverses of execution times. But this is not inversely proportional to the sum of execution times. In other words, the arithmetic mean of the MIPS rate does not cleanly relate to execution time. On the other hand, the harmonic mean MIPS rate is the inverse of the average execution time.b.Arithmetic meanHarmonic MeanRankComputer A25.3 MIPS0.25 MIPS2Computer B2.8 MIPS0.21 MIPS3Computer C3.25 MIPS2.1 MIPS1hapter 3Computer Function and InterconnectionAnswers to Questions3.1Processor-memory: Data may be transferred from processor to memory or from memory to processor. Processor-I/O: Data may be transferred to or from a peripheral device by transferring between the processor and an I/O module. Data processing: The processor may perform some arithmetic or logic operation on data. Control: An instruction may specify that the sequence of execution be altered.3.2Instruction address calculation (iac): Determine the address of the next instruction to be executed. Instruction fetch (if): Read instruction from its memory location into the processor. Instruction operation decoding (iod): Analyze instruction to determine type of operation to be performed and operand(s) to be used. Operand address calculation (oac): If the operation involves reference to an operand in memory or available via I/O, then determine the address of the operand. Operand fetch (of): Fetch the operand from memory or read it in from I/O. Data operation (do): Perform the operation indicated in the instruction. Operand store (os): Write the result into memory or out to I/O.3.3(1) Disable all interrupts while an interrupt is being processed. (2) Define priorities for interrupts and to allow an interrupt of higher priority to cause a lower-priority interrupt handler to be interrupted.3.4Memory to processor: The processor reads an instruction or a unit of data from memory. Processor to memory: The processor writes a unit of data to memory. I/O to processor: The processor reads data from an I/O device via an I/O module. Processor to I/O: The processor sends data to the I/O device. I/O to or from memory: For these two cases, an I/O module is allowed to exchange data directly with memory, without going through the processor, using direct memory access (DMA).3.5With multiple buses, there are fewer devices per bus. This (1) reduces propagation delay, because each bus can be shorter, and (2) reduces bottleneck effects.3.6System pins: Include the clock and reset pins. Address and data pins: Include 32 lines that are time multiplexed for addresses and data. Interface control pins: Control the timing of transactions and provide coordination among initiators and targets. Arbitration pins: Unlike the other PCI signal lines, these are not shared lines. Rather, each PCI master has its own pair of arbitration lines that connect it directly to the PCI bus arbiter. Error Reporting pins: Used to report parity and other errors. Interrupt Pins: These are provided for PCI devices that must generate requests for service. Cache support pins: These pins are needed to support a memory on PCI that can be cached in the processor or another device. 64-bit Bus extension pins: Include 32 lines that are time multiplexed for addresses and data and that are combined with the mandatory address/data lines to form a 64-bit address/data bus. JTAG/Boundary Scan Pins: These signal lines support testing procedures defined in IEEE Standard 1149.1.Answers to Problems3.1Memory (contents in hex): 300: 3005; 301: 5940; 302: 7006Step 1: 3005 IR; Step 2: 3 ACStep 3: 5940 IR; Step 4: 3 + 2 = 5 ACStep 5: 7006 IR; Step 6: AC Device 63.21.a.The PC contains 300, the address of the first instruction. This value is loaded in to the MAR.b.The value in location 300 (which is the instruction with the value 1940 in hexadecimal) is loaded into the MBR, and the PC is incremented. These two steps can be done in parallel.c.The value in the MBR is loaded into the IR. 2.a.The address portion of the IR (940) is loaded into the MAR.b.The value in location 940 is loaded into the MBR.c.The value in the MBR is loaded into the AC.3.a.The value in the PC (301) is loaded in to the MAR.b.The value in location 301 (which is the instruction with the value 5941) is loaded into the MBR, and the PC is incremented.c.The value in the MBR is loaded into the IR. 4.a.The address portion of the IR (941) is loaded into the MAR.b.The value in location 941 is loaded into the MBR.c.The old value of the AC and the value of location MBR are added and the result is stored in the AC.5.a.The value in the PC (302) is loaded in to the MAR.b.The value in location 302 (which is the instruction with the value 2941) is loaded into the MBR, and the PC is incremented.c.The value in the MBR is loaded into the IR. 6.a.The address portion of the IR (941) is loaded into the MAR.b.The value in the AC is loaded into the MBR.c.The value in the MBR is stored in location 941.3.3a.224 = 16 MBytesb.(1)If the local address bus is 32 bits, the whole address can be transferred at once and decoded in memory. However, because the data bus is only 16 bits, it will require 2 cycles to fetch a 32-bit instruction or operand.(2)The 16 bits of the address placed on the address bus cant access the whole memory. Thus a more complex memory interface control is needed to latch the first part of the address and then the second part (because the microprocessor will end in two steps). For a 32-bit address, one may assume the first half will decode to access a row in memory, while the second half is sent later to access a column in memory. In addition to the two-step address operation, the microprocessor will need 2 cycles to fetch the 32 bit instruction/operand.c.The program counter must be at least 24 bits. Typically, a 32-bit microprocessor will have a 32-bit external address bus and a 32-bit program counter, unless on-chip segment registers are used that may work with a smaller program counter. If the instruction register is to contain the whole instruction, it will have to be 32-bits long; if it will contain only the op code (called the op code register) then it will have to be 8 bits long.3.4In cases (a) and (b), the microprocessor will be able to access 216 = 64K bytes; the only difference is that with an 8-bit memory each access will transfer a byte, while with a 16-bit memory an access may transfer a byte or a 16-byte word. For case (c), separate input and output instructions are needed, whose execution will generate separate I/O signals (different from the memory signals generated with the execution of memory-type instructions); at a minimum, one additional output pin will be required to carry this new signal. For case (d), it can support 28 = 256 input and 28 = 256 output byte ports and the same number of input and output 16-bit ports; in either case, the distinction between an input and an output port is defined by the different signal that the executed input or output instruction generated.3.5Clock cycle = Bus cycle = 4 125 ns = 500 ns2 bytes transferred every 500 ns; thus transfer rate = 4 MBytes/secDoubling the frequency may mean adopting a new chip manufacturing technology (assuming each instructions will have the same number of clock cycles); doubling the external data bus means wider (maybe newer) on-chip data bus drivers/latches and modifications to the bus control logic. In the first case, the speed of the memory chips will also need to double (roughly) not to slow down the microprocessor; in the second case, the wordlength of the memory will have to double to be able to send/receive 32-bit quantities.3.6a. Input from the Teletype is stored in INPR. The INPR will only accept data from the Teletype when FGI=0. When data arrives, it is stored in INPR, and FGI is set to 1. The CPU periodically checks FGI. If FGI =1, the CPU transfers the contents of INPR to the AC and sets FGI to 0.When the CPU has data to send to the Teletype, it checks FGO. If FGO = 0, the CPU must wait. If FGO = 1, the CPU transfers the contents of the AC to OUTR and sets FGO to 0. The Teletype sets FGI to 1 after the word is printed.b.The process described in (a) is very wasteful. The CPU, which is much faster than the Teletype, must repeatedly check FGI and FGO. If interrupts are used, the Teletype can issue an interrupt to the CPU whenever it is ready to accept or send data. The IEN register can be set by the CPU (under programmer control)3.7a.During a single bus cycle, the 8-bit microprocessor transfers one byte while the 16-bit microprocessor transfers two bytes. The 16-bit microprocessor has twice the data transfer rate.b.Suppose we do 100 transfers of operands and instructions, of which 50 are one byte long and 50 are two bytes long. The 8-bit microprocessor takes 50 + (2 x 50) = 150 bus cycles for the transfer. The 16-bit microprocessor requires 50 + 50 = 100 bus cycles. Thus, the data transfer rates differ by a factor of 1.5. Source: PROT88.3.8The whole point of the clock is to define event times on the bus; therefore, we wish for a bus arbitration operation to be made each clock cycle. This requires that the priority signal propagate the length of the daisy chain (Figure 3.26) in one clock period. Thus, the maximum number of masters is determined by dividing the amount of time it takes a bus master to pass through the bus priority by the clock period.3.9The lowest-priority device is assigned priority 16. This device must defer to all the others. However, it may transmit in any slot not reserved by the other SBI devices. 3.10At the beginning of any slot, if none of the TR lines is asserted, only the priority 16 device may transmit. This gives it the lowest average wait time under most circumstances. Only when there is heavy demand on the bus, which means that most of the time there is at least one pending request, will the priority 16 device not have the lowest average wait time.3.11a.With a clocking frequency of 10 MHz, the clock period is 109 s = 100 ns. The length of the memory read cycle is 300 ns.b.The Read signal begins to fall at 75 ns from the beginning of the third clock cycle (middle of the second half of T3). Thus, memory must place the data on the bus no later than 55 ns from the beginning of T3. Source: PROT883.12a.The clock period is 125 ns. Therefore, two clock cycles need to be inserted.b.From Figure 3.19, the Read signal begins to rise early in T2. To insert two clock cycles, the Ready line can be put in low at the beginning of T2 and kept low for 250 ns. Source: PROT883.13a.A 5 MHz clock corresponds to a clock period of 200 ns. Therefore, the Write signal has a duration of 150 ns.b.The data remain valid for 150 + 20 = 170 ns.c.One wait state. Source: PROT883.14a.Without the wait states, the instruction takes 16 bus clock cycles. The instruction requires four memory accesses, resulting in 8 wait states. The instruction, with wait states, takes 24 clock cycles, for an increase of 50%.b.In this case, the instruction takes 26 bus cycles without wait states and 34 bus cycles with wait states, for an increase of 33%. Source: PROT883.15a.The clock period is 1

展开阅读全文

计算机组织与结构(英文版)课后答案

最新文档