分布式系统英文习题答案

资源描述

CHAPTER 1 PROBLEMS 1. Q: What is the role of middleware in a distributed system?A: To enhance the distribution transparency that is missing in network operating systems. In other words, middleware aims at improving the single-system view that a distributed system should have.2. Q: Explain what is meant by (distribution) transparency, and give examples of different types of transparency.A: Distribution transparency is the phenomenon by which distribution aspects in a system are hidden from users and applications. Examples include access transparency, location transparency, migration transparency, relocation transparency,replication transparency, concurrency transparency, failure transparency, and persistence transparency.3. Q: Why is it sometimes so hard to hide the occurrence and recovery from failures in a distributed system?A: It is generally impossible to detect whether a server is actually down, or that it is simply slow in responding. Consequently, a system may have to report that a service is not available, although, in fact, the server is just slow.4. Q: Why is it not always a good idea to aim at implementing the highest degree of transparency possible?A: Aiming at the highest degree of transparency may lead to a considerable loss of performance that users are not willing to accept.5. Q: What is an open distributed system and what benefits does openness provide?A: An open distributed system offers services according to clearly defined rules. An open system is capable of easily interoperating with other open systems but also allows applications to be easily ported between different implementationsof the same system.6. Q: Describe precisely what is meant by a scalable system.A: A system is scalable with respect to either its number of components, geographical size, or number and size of administrative domains, if it can grow in one or more of these dimensions without an unacceptable loss of performance.7. Q: Scalability can be achieved by applying different techniques. What are these techniques?A: Scaling can be achieved through distribution, replication, and caching.8. Q: What is the difference between a multiprocessor and a multicomputer?A: In a multiprocessor, the CPUs have access to a shared main memory. There is no shared memory in multicomputer systems. In a multicomputer system, the CPUs can communicate only through message passing.9. Q: A multicomputer with 256 CPUs is organized as a 16 16 grid. What is the worst-case delay (in hops) that a message might have to take?A: Assuming that routing is optimal, the longest optimal route is from one corner of the grid to the opposite corner. The length of this route is 30 hops. If the end processors in a single row or column are connected to each other, the length becomes 15.10. Q: Now consider a 256-CPU hypercube. What is the worst-case delay here, again in hops?A: With a 256-CPU hypercube, each node has a binary address, from 00000000 to 11111111. A hop from one machine to another always involves changing a single bit in the address. Thus from 00000000 to 00000001 is one hop. From there to 00000011 is another hop. In all, eight hops are needed.11. Q: What is the difference between a distributed operating system and a network operating system?A: A distributed operating system manages multiprocessors and homogeneous multicomputers. A network operating system connects different independent computers that each have their own operating system so that users can easily use the services available on each computer.12. Q: Explain how microkernels can be used to organize an operating system in a client-server fashion.A: A microkernel can separate client applications from operating system services by enforcing each request to pass through the kernel. As a consequence, operating system services can be implemented by (perhaps different) userlevel servers that run as ordinary processes. If the microkernel has networking capabilities, there is also no principal objection in placing those servers onremote machines (which run the same microkernel).13. Q: Explain the principal operation of a page-based distributed shared memory system.A: Page-based DSM makes use of the virtual memory capabilities of an operating system. Whenever an application addresses a memory location that is currently not mapped into the current physical memory, a page fault occurs, giving the operating system control. The operating system can then locate the referred page, transfer its content over the network, and map it to physicalmemory. At that point, the application can continue.14. Q: What is the reason for developing distributed shared memory systems?What do you see as the main problem hindering efficient implementations?A: The main reason is that writing parallel and distributed programs based on message-passing primitives is much harder than being able to use shared memory for communication. Efficiency of DSM systems is hindered by the fact, no matter what you do, page transfers across the network need to take place. If pages are shared by different processors, it is quite easy to get into a state similar to thrashing in virtual memory systems. In the end, DSM systems can never be faster than message-passing solutions, and will generallybe slower due to the overhead incurred by keeping track of where pages are.15. Q: Explain what false sharing is in distributed shared memory systems. What possible solutions do you see?A: False sharing happens when data belonging to two different and independent processes (possibly on different machines) are mapped onto the same logical page. The effect is that the page is swapped between the two processes, leading to an implicit and unnecessary dependency. Solutions include making pages smaller or prohibiting independent processes to share apage.16. Q: An experimental file server is up 3/4 of the time and down 1/4 of the time, due to bugs. How many times does this file server have to be replicated to give an availability of at least 99%?A: With k being the number of servers, we have that (1/4)k0.01, expressing that the worst situation, when all servers are down, should happen at most 1/100 of the time. This gives us k=4.17. Q: What is a three-tiered client-server architecture?A: A three-tiered client-server architecture consists of three logical layers, where each layer is, in principle, implemented at a separate machine. The highest layer consists of a client user interface, the middle layer contains the actual application, and the lowest layer implements the data that are being used.18. Q: What is the difference between a vertical distribution and a horizontal distribution?A: Vertical distribution refers to the distribution of the different layers in a multitiered architectures across multiple machines. In principle, each layer is implemented on a different machine. Horizontal distribution deals with the distribution of a single layer across multiple machines, such as distributing a single database.19. Q: Consider a chain of processes P1, P2, ., Pn implementing a multitiered client-server architecture. Process Pi is client of process Pi +1, and Pi will return a reply to Pi 1 only after receiving a reply from Pi +1. What are the main problems with this organization when taking a look at the request-reply performance at process P1?A: Performance can be expected to be bad for large n. The problem is that each communication between two successive layers is, in principle, between two different machines. Consequently, the performance between P1 and P2 may also be determined by n 2 request-reply interactions between the other layers. Another problem is that if one machine in the chain performs badly oris even temporarily unreachable, then this will immediately degrade the performance at the highest level.SOLUTIONS TO CHAPTER 2 PROBLEMS1. Q: In many layered protocols, each layer has its own header. Surely it would be more efficient to have a single header at the front of each message with all the control in it than all these separate headers. Why is this not done?A: Each layer must be independent of the other ones. The data passed from layer k + 1 down to layer k contains both header and data, but layer k cannot tell which is which. Having a single big header that all the layers could read and write would destroy this transparency and make changes in the protocol of one layer visible to other layers. This is undesirable.2. Q: Why are transport-level communication services often inappropriate for building distributed applications?A: They hardly offer distribution transparency meaning that application developers are required to pay significant attention to implementing communication, often leading to proprietary solutions. The effect is that distributed applications, for example, built directly on top of sockets are difficult to port and to interoperate with other applications.3. Q: A reliable multicast service allows a sender to reliably pass messages to a collection of receivers. Does such a service belong to a middleware layer, or should it be part of a lower-level layer?A: In principle, a reliable multicast service could easily be part of the transport layer, or even the network layer. As an example, the unreliable IP multicasting service is implemented in the network layer. However, because such services are currently not readily available, they are generally implemented using transport-level services, which automatically places them in the middleware. However, when taking scalability into account, it turns out that reliability can be guaranteed only if application requirements are considered.This is a strong argument for implementing such services at higher, less general layers.4. Q: Consider a procedure incr with two integer parameters. The procedure adds one to each parameter. Now suppose that it is called with the same variable twice, for example, as incr(i, i). If i is initially 0, what value will it have afterward if call-by-reference is used? How about if copy/restore is used?A: If call by reference is used, a pointer to i is passed to incr. It will be incremented two times, so the final result will be two. However, with copy/restore, i will be passed by value twice, each value initially 0. Both will be incremented,so both will now be 1. Now both will be copied back, with the second copy overwriting the first one. The final value will be 1, not 2.5. Q: C has a construction called a union, in which a field of a record (called a struct in C) can hold any one of several alternatives. At run time, there is no sure-fire way to tell which one is in there. Does this feature of C have any implications for remote procedure call? Explain your answer.A: If the runtime system cannot tell what type value is in the field, it cannot marshal it correctly. Thus unions cannot be tolerated in an RPC system unless there is a tag field that unambiguously tells what the variant field holds. Thetag field must not be under user control.6. Q: One way to handle parameter conversion in RPC systems is to have each machine send parameters in its native representation, with the other one doing the translation, if need be. The native system could be indicated by a code inthe first byte. However, since locating the first byte in the first word is precisely the problem, can this actually work?A: First of all, when one computer sends byte 0, it always arrives in byte 0. Thus the destination computer can simply access byte 0 (using a byte instruction) and the code will be in it. It does not matter whether this is the loworder byte or the high-order byte. An alternative scheme is to put the code in all the bytes of the first word. Then no matter which byte is examined, thecode will be there.7. Q: Assume a client calls an asynchronous RPC to a server, and subsequently waits until the server returns a result using another asynchronous RPC. Is this approach the same as letting the client execute a normal RPC? What if we replace the asynchronous RPCs with asynchronous RPCs?A: No, this is not the same. An asynchronous RPC returns an acknowledgement to the caller, meaning that after the first call by the client, an additional message is sent across the network. Likewise, the server is acknowledged that its response has been delivered to the client. Two asynchronous RPCs may be the same, provided reliable communication is guaranteed. This is generallynot the case.8. Q: Instead of letting a server register itself with a daemon as is done in DCE, we could also choose to always assign it the same endpoint. That endpoint can then be used in references to objects in the servers address space. What is the main drawback of this scheme?A: The main drawback is that it becomes much harder to dynamically allocate objects to servers. In addition, many endpoints need to be fixed, instead of just one (i.e., the one for the daemon). For machines possibly having a large number of servers, static assignment of endpoints is not a good idea.9. Q: Give an example implementation of an object reference that allows a client to bind to a transient remote object.A: Using Java, we can express such an implementation as the following class:public class Object3reference InetAddress server3address; / network address of objects serverint server3endpoint; / endpoint to which server is listeningint object3identifier; / identifier for this objectURL client3code; / (remote) file containing client-side stubbyte init3data; / possible additional initialization dataThe object reference should at least contain the transport-level address of the server where the object resides. We also need an object identifier as the server may contain several objects. In our implementation, we use a URL to refer to a (remote) file containing all the necessary client-side code. A generic array of bytes is used to contain further initialization data for that code. Analternative implementation would have been to directly put the client-code into the reference instead of a URL. This approach is followed, for example, in Java RMI where proxies are passed as reference.10. Q: Java and other languages support exceptions, which are raised when an error occurs. How would you implement exceptions in RPCs and RMIs?A: Because exceptions are initially raised at the server side, the server stub can do nothing else but catch the exception and marshal it as a special error response back to the client. The client stub, on the other hand, will have to unmarshal the message and raise the same exception if it wants to keep access to the server transparent. Consequently, exceptions now also need to bedescribed in an interface definition language.11. Q: Would it be useful to also make a distinction between static and dynamic RPCs?A: Yes, for the same reason it is useful with remote object invocations: it simply introduces more flexibility. The drawback, however, is that much of the distribution transparency is lost for which RPCs were introduced in the first place.12. Q: Some implementations of distributed-object middleware systems are entirely based on dynamic method invocations. Even static invocations are compiled to dynamic ones. What is the benefit of this approach?A: Realizing that an implementation of dynamic invocations can handle all invocations, static ones become just a special case. The advantage is that only a single mechanism needs to be implemented. A possible disadvantage is that performance is not always as optimal as it could be had we analyzed the static invocation.13. Q: Describe how connectionless communication between a client and a server proceeds when using sockets.A: Both the client and the server create a socket, but only the server binds the socket to a local endpoint. The server can then subsequently do a blocking read call in which it waits for incoming data from any client. Likewise, aftercreating the socket, the client simply does a blocking call to write data to the server. There is no need to close a connection.14. Q: Explain the difference between the primitives mpi3bsend and mpi3isend in MPI.A: The primitive mpi3bsend uses buffered communication by which the caller passes an entire buffer containing the messages to be sent, to the local MPI runtime system. When the call completes, the messages have either been transferred, or copied to a local buffer. In contrast, with mpi3isend, the caller passes only a pointer to the message to the local MPI runtime system afterwhich it immediately continues. The caller is responsible for not overwriting the message that is pointed to until it has been copied or transferred.15. Q: Suppose you could make use of only transient asynchronous communication primitives, including only an asynchronous receive primitive. How would you implement primitives for transient synchronous communication?A: Consider a synchronous send primitive. A simple implementation is to send a message to the server using asynchronous communication, and subsequently let the caller continuously poll for an incoming acknowledgement orresponse from the server. If we assume that the local operating system stores incoming messages into a local buffer, then an alternative implementation is to block the caller until it receives a signal from the operating system that amessage has arrived, after which the caller does an asynchronous receive.16. Q: Now suppose you could make use of only transient synchronous communication primitives. How would you implement primitives for transient asynchronous communication?A: This situation is actually simpler. An asynchronous send is implemented by having a caller append its message to a buffer that is shared with a process that handles the actual message transfer. Each time a client appends a messageto the buffer, it wakes up the send process, which subsequently removes the message from the buffer and sends it its destination using a blocking call to the original send primitive. The receiver is implemented similarly by offering a buffer that can be checked for incoming messages by an application.17. Q: Does it make sense to implement persistent asynchronous communication by means of RPCs?A: Yes, but only on a hop-to-hop basis in which a process managing a queue passes a message to a next queue manager by means of an RPC. Effectively, the service offered by a queue manager to another is the storage of a message. The calling queue manager is offered a proxy implementation of the interface to the remote queue, possibly receiving a status indicating the success orfailure of each operation. In this way, even queue managers see only queues and no further communication.18. Q: In the text we stated that in order to automatically start a process to fetch messages from an input queue, a daemon is often used that monitors the input queue. Give an alternative implementation that does not make use of a daemon.A: A simple scheme is to let a process on the receiver side check for any incoming messages each time that process puts a message in its own queue.19. Q: Routing tables in IBM MQSeries, and in many other message-queuing systems, are configured manually. Describe a simple way to do this automatically.A: The simplest implementation is to have a centralized component in which the topology of the queuing network is maintained. That component simply calculates all best routes between pairs of queue managers using a known routing algorithm, and subsequently generates routing tables for each queue manager. These tables can be downloaded by each manager separately. This approach works in queuing networks where there are only relatively few, but possibly widely dispersed, queue managers. A more sophisticated approach is to decentralize the routing algorithm, by having each queue manager discover the n

展开阅读全文

分布式系统英文习题答案

最新文档