P2PStorageSystems

上传人:li****i 文档编号:243072071 上传时间:2024-09-15 格式:PPT 页数:40 大小:842KB
返回 下载 相关 举报
P2PStorageSystems_第1页
第1页 / 共40页
P2PStorageSystems_第2页
第2页 / 共40页
P2PStorageSystems_第3页
第3页 / 共40页
点击查看更多>>
资源描述
按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,*,Replica Control for Peer-to-Peer Storage Systems,P2P,Peer-to-peer,(P2P) has emerged as an important paradigm model for sharing resources at the edges of the Internet.,The most widely exploited resource is storage, as typified in P2P music file sharing,Napster,Gnutella,Following the great success of P2P file sharing, a natural next step is to develop wide-area, P2P storage systems to aggregate the storage across the Internet.,Replica Control Protocol,Replication,to maintain multiple copies of some critical data to increase the availability,Replica Control Protocol,to guarantee a consistent view of the replicated data,Replica Control Methods,Optimistic,Proceed optimistically with computation on the available subgroup and join later with consistency,Approaches,Log, Version vector, etc.,Pessimistic,Restrict computations with worst-case assumptions,Approaches,Primary site, Voting, etc.,5,Write-ahead Log,Files are actually modified in place, but before any file block is changed, a record is written to a log telling which node is making the change, which file block is being changed, and what the old and new values are,Only after the log has been written successfully is the change made to the file,Write-ahead log can be used for undo (rollback) and redo,Version Vector,Version vector for file f,N element vector, where N is the number of nodes in which f is stores,The i,th,element represents the number of updates done by node I,A vector V dominated V,if,Every element in V = corresponding element in V,Conflicts if neither dominates,Version Vector,Consistency resolution,If V dominates V, inconsistent; can be resolved by copying V to V,If V and V,conflict, inconsistency is detected,Version vector can detect only update conflicts; cannot detect read-write conflicts,Primary Site Approach,Data replicated on at least k+1 nodes (for k-resilient),One node acts as the primary site (PS),Any read request is served by the PS,Any write request is copied to all other back-up sites,Any write request to back-up sites are forwarded to the PS,PS Failure Handling,If back-up fails, no interruption in service,If PS fails, there are two possibilities,If the network not segmented,Choose a back-up node as the primary,If checkpointing has been active, need to restart only from the previous checkpoint,If segmented,Only the partition with PS can progress,Other partitions stops updates on data,Necessary to distinguish between site failures and network partitions,Voting Approach,V votes are distributed to n replicas with,Vw+Vr V,Vw+Vw V,Obtain Vr or more votes to read,Obtain Vw or more votes to write,Quorum system is more general than voting,Quorum Systems,Trees,Grid-based (array-based),Torus,Hierarchical,Multi-column,and so on,12,/40,Quorum-Based Schemes (1/2),n replicas with version numbers,Read operation,Read-lock,and access a read quorum,Obtaining a largest-version-number replica,Write operation,Write-lock,and access a write quorum,Updating all replicas with,the new version number,the largest+ 1,13,/40,Quorum-Based Schemes (2/2),One-copy equivalence,is guaranteed If we restrict,Write-write and write-read lock exclusion,Intersection Property,A non-empty intersection in any pair of,A read quorum and a write quorum,Two write quorums,The set of replicas must behave as if there is only a single copy. This is the strictest consistency criterion.,Witnesses,Witness,- small entity that maintains enough information to identify the replicas that contain the most recent version of the data,- the information could also be a,timestamp,containing the time of the latest update,- the information could also be a,version number, which is an integer incremented each time the data are updated,Classification of P2P Storage Sys.,Unstructured,“Replication Strategies for Highly Available Peer-to-peer Storage”,“Replication Strategies in Unstructured Peer-to-peer Networks”,Structured,CFS,PAST,LAR,Ivy,Oasis,Om,Eliot,Sigma (for mutual exclusion primitive),Read only,Read/Write (Mutable),Ivy,Stores a set of,logs,with the aid of distributed hash tables.,Ivy keeps, for each participant, a log storing all its updates, and maintains data consistency optimistically by performing conflict resolutions among all logs. (Maintain data consistency in a best-effort manner),The logs should be kept indefinitely and a participant must scan all the logs related to a file to look up the up-to-date file data. Thus, Ivy is only suitable for small groups of participants.,Ivy uses DHT,DHT provides,Simple API,Put(key, value) and get(key),value,Availability (Replication),Robustness (Integrity checking),Distributed hash table,Distributed application,get (key),data,put(key, data),Lookup service,lookup(key),node IP address,(Ivy),(DHash),(Chord),Solution: Log Based,Update: Each participant maintains a log of changes to the file system,Lookup: Each participant scans all logs,Eliot,Eliot relies a reliable, fault-tolerant, immutable P2P storage substrate,Charles,to store data blocks, and uses an,auxiliary metadata service (MS),for storing mutable metadata.,It supports,NFS-like consistency,semantics; however, the traffic between MS and the client is high for such semantics.,It also supports,AFS open-close consistency,semantics; however, this semantics may cause the problem of lost updates.,The MS service is provided by a conventional replicated database, which may be not fit for dynamic P2P environments.,Oasis,Oasis is based on,Giffords weighted voting,quorum concept and allows dynamic quorum membership.,It spreads versioned metadata along with data replicas over the P2P network.,To complete an operation on a data object, a client must,first find a metadata,related to the object and figure out the total number of votes, required votes for read/write operations, replica list, and so on, to form a quorum accordingly.,One drawback of Oasis is that if a node happens to use a,stale metadata, the data,consistency may be violated,.,Om,Om is based on the concepts of,automatic replica regeneration,and replica,membership reconfiguration,.,The consistency is maintained by two quorum systems: a,read-one-write-all,quorum system for accessing replicas, and a,witness-modeled quorum system,for reconfiguration.,Om allows replica regeneration from single replica. However, a write in Om is always first forwarded to the primary copy, which,serializing,all writes and uses a,two-phase procedure,to propagate the write to all secondary replicas.,The drawbacks of Om are (1) the primary replica may become a bottleneck (2) the overhead incurred by the two-phase procedure may be too high (3) the reconfiguration by witness model has the probability of violating consistency.,Om and Pastry,80,90,98,99,100,101,103,104,120,PAST example,Object key = 100,Om and Pastry,80,90,98,99,100,101,103,104,120,PAST example,Object key = 100,Replication,Om and Pastry,80,90,98,99,100,101,103,104,120,PAST example,Object key = 100,Replication,Replica crash,Om and Pastry,80,90,98,99,100,101,103,104,120,PAST example,Object key = 100,Replication,Replica crash,Regeneration,Om: Normal Case Operation,80,90,98,99,100,101,103,104,120,primary,write,read,Read-one / write-all approach,Writes serialized via primary,Om: System Architecture Overview,Witness Model,The witness model utilizes the following,limited view divergence property,Intuitively, the property says that,two replicas are unlikely to have a completely different view,regarding the reachability of a set of randomly-placed witnesses.,Witness Model,To utilize the limited view divergence property, all replicas,logically organize the witnesses into an m*t matrix,The number of rows, m, determines the probability of intersection,The number of columns, t, protects against the failure of individual witnesses, so that each row has at least one functioning witness with high probability,Witness Model,Sigma System Model,Sigma System Model,Replicas are always available, but their internal states may be randomly reset.,failure-recovery,model,The number of clients is unpredictable.,Clients are not malicious and fail stop,.,Clients and replicas communicate via,messages, which could be replicated, lost, but,never forged,.,Sigma,The Sigma protocol intelligently collect states from all replicas to achieve,mutual exclusion,.,The basic idea of the Sigma protocol is as follows. A node,u,wishing to be the,winner,of the mutual exclusion sends a timestamped request for each of the totally,n,(,n,=3,k,+1) replicas and waits for replies. On receiving a request from,u, a node,v,should put,u,s request in a local queue by the timestamp order, takes the node as the winner whose request is in the front of the queue, and,reply the winner ID to,u,.,Sigma,When the number of replies received by,u,exceeds,m,(,m,=2,k,+1),u,acts according to the following conditions:,(1),if more than,m,replies take,v,as the winner, then,u,is the winner.,(2),if more than,m,replies take,w,(,w,u,) as the winner, then,w,is the winner and,u,just keeps waiting.,(3) if no node is regarded as the winner by more than,m,replies, then,u,sends YIELD message to cancel its request temporarily and then re-inserts its request again with random backoff,.,In this manner, one node can eventually be elected as the winner even when communication delay variance is large.,A drawback of the Sigma protocol is that a node needs to send requests to all replicas and gets advantaged replies from a large portion (,2/3) of nodes to be the winner of the mutual exclusion, which will incur large overhead. Moreover, the overhead will even be larger under an environment of high contention.,References,Ivy:A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, “Ivy: A Read/write Peer-to-peer File System,” in Proc. of the Symposium on Operating Systems Design and Implementation (OSDI), 2002.,Eliot:C. Stein, M. Tucker, and M. Seltzer, “Building a Reliable Mutable File System on Peer-to-peer Storage,” in Proc. of 21st IEEE Symposium on Reliable Distributed Systems (WRP2PDS), 2002.,Oasis:M. Rodrig, and A. Lamarca, “Decentralized Weighted Voting for P2P Data Management,” in,Proc. of the 3rd ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 8592,2003.,OM:H. Yu. and A. Vahdat, “Consistent and Automatic Replica Regeneration,” in,Proc. of First Symposium on Networked Systems Design and Implementation (NSDI 04),2004.,Sigma:,S. Lin, Q. Lian, M. Chen, and Z. Zhang, “A practical distributed mutual exclusion protocol in dynamic peer-to- peer systems,” in,Proc. of 3rd International Workshop on Peer-to-Peer Systems (IPTPS04),2004.,Q&A,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 小学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!