资源描述
John Dunagan,Juhan Lee(MSN),Alec WolmanWIP2lImprove the ability of large internet portals to gain insight into failureslNon-goals:lmasking failuresluse machine learning to inferabnormal behavior3lMessenger,msn ,Hotmail,Search,many other“propertieslLarge(100 million users)lSources of Complexity:lmultiple data-centers llarge#of machineslcomplex internal network topologyldiversity of applications and software infrastructure4lDetecting,managing,and diagnosing failureslReview MSNs current approacheslDescribe our solution at a high level5lMonitor system availability with heartbeatslMonitor applications availability&quality of service using synthetic requestslCustomer complaintslTelephone,emailProblems:lThese approaches provide limited coverage harder to catch failures that dont affect every requestlData on detected failures often lacks necessary detail to suggest a remedy:lwhich front end is flaky?lwhich app component caused end-user failure?6Definition:Ability to prioritize failures Detect component service degradation Characterizing app-stability Capacity planningWhen server“x fails,what is the impact of this failure?Better use of ops and engineering resourcesCurrent approach:no systematic attempt to provide this functionality7Detecting and Managing FailuresStep 1:Instrument applications to track user requests across the“service chainEach request is tagged with a unique idService chain is composed on-the-fly with help of app instrumentationFor each request:Collect per-hop performance informationCollect per-request failure statusCentralized data collection8We can handle:lMachine failureslNetwork connectivity problemsMost:lMisconfigurationlApplication bugsBut not all:lApplication errors where app itself doesnt detect that there is a problem9lAssigning responsibility to a specific hw or sw componentlInsight into internals of a component lCross component interactionslCurrent approach:instrument applicationslApp-specific log messageslProblemslHigh request rates=log rolloverlPerceived overhead=detailed logging enabled during testing,disabled in production10lFUSE(OSDI 2004):lightweight agreement on only one thing:whether or not a failure has occurredlLack of a positive ack=failure11lStep 2:Implement“conditional logging to significantly reduce the overhead of collecting detailed logs across different machines in the service chainlStep 1 provides ability to identify a request across all participants in the service chain,Fuse provides agreement on failure status across that chainlWhile fate is undecided:Detailed log messages stored in main memorylCommon case overload of logging is vastly reducedlOnce the fate of service chain is decided,we discard app logs for successful requests and save logs for failureslQuantity of data generated is manageable,when most requests are successful12Benefits:lFUSE allows monitoring of real transactions.lAll transactions,or a sampled subset to control overhead.lWhen a request fails,FUSE provides an audit traillHow far did it get?lHow long did each step take?lAny additional application specific context.lFUSE can be deployed incrementally.Server1Server3Server2ClientX13lOverload policy:need to handle bursts of failures without inducing more failureslHow much effort to make apps FUSE enabled?lAre the right components FUSE enabled?lIdentifying and filtering false positiveslTracking request flow is non-trivial with network load balancers14lWeve implemented FUSE for MSN,integrated with ASP.NET rendering enginelTesting in progresslRoll-out at end of summer1516Example current code on Front End:ReceiveRequestFromClient()SendRequestToBackEnd();Example code on Front End using FUSE:ReceiveRequestFromClient(,FUSEinfo f)/default value of f=nullif(f!=null)JoinFUSEGroup(f);SendRequestToBackEnd(,f);Current implementation is in C#,and consists of 2400 LOCmc9;51+L*H$EYAUwQsjpflb730-K&G#CXzTvPrhndka:63N-K&G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;53N)J%G#CWySuOqhnd9;5G#DXSuPrhnd9;53N)J%G#CWySuOqhnd9;51=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnda:63N)K&G#CWySuPrhnd9;63N)J%G#CWySuOrhnd9;52=M(I$EYAVxRtjpflc8.40-K*H!DXzTvQsioeka:63N)J%FZCWySuOqgnd9;51+M(I$EYAUxRtjpflb8.40-K&G!DXzTvPrioeka:640-K&G#DXzTvPrhneka:630-K&G#CWzTvPrhnda:63N-K&G#CWySvPrhnd9;62=M(I$FZBVxRtjpgmc8.41+L*H!DXzUwQseka:630-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;63N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflc8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgnd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrioeka:640-K&G#CXzTvPrhneka:63N)J%F#CWySuOqhnd9;51=M(I$EYAVxRtjpflb8.40-K&H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+L(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:630-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;62=M(I$EZBVxRtjpfmc8.40+L*H!DTvPrhneka:630-K&G#CWzTvPrhnda:63N-K&G#CWySvPrhnd9:63N)J&G#CWySuOrhnd9;53N)J%F#CWySuOqhnd9;51=M(I$EYAUxRtjpflb8.40-K&H!DXzTvPsioeka:63N)K&G#CWySvPrhnd9;63N)J%G#CWySuOrhnd9;52=M(I$EYBVxRtjpflc8.40-K*H!DXzTvQsioeka:63N)J%F#CWySuOqgnd9;51+M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:640-K&G#DXzTvPrhneka:630-K&G#CWzTvPrhnda:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnda:62=M(I%FZUxRtjpflb8.40-K&H!DXzTvPrioeka:63N)J%FZBWySuOqgmd9;51+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CWzTvPrhndka:63N-K&G#CWyTvPrhnd9:62=M(I$FZBVtjpflb8.40-K&G!DXzTvPrioeka:640-K&G#DXzTvPrhneka:630-K&G#CWzTvPrhnda:62(I$EYAVxRtjpflb8.40-K&H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+L(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:630-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;63N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflc8.40-K*H!DXzTPrhndka:63N-K&G#CWySvPrhnd9:63N)J&G#CWySuPrhnd9;53N)J%F#CWySuOqhnd9;51=M(I$EYAVxRtjpflb8.40-K&H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+L(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:63N)J&G#CWySuOrhnd9;52=M(I$EYBVxRtjpflc8.40-L*H!DXzTvQsioeka:63N)J%F#CWySuOqgnd9;51+M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:640-K&G#DXzTvPrhneka:63N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhneka:630-K&G#ySuOqhnd9;51=M(I$EYAUxRtjpflb8.40-K&H!DXzTvPsioeka:63N)J%FZBWySuOqgmd9;51+L(I$EYAUwRtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CXzTvPrhndka:63N)J%FZCWySuOqgnd9;51+M(I$EYAUxRtjpflb8.40-K&G!DXzTvPrioeka:640-K&G#DXzTvPrhneka:630-K&G#CWzOqhnd9;51=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhndka:63N)J%F#CWySuOqgnd9;51=M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:63N)J%FZBWySuOqgmd9;51+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CWzTvPrhndka:63N)J%FZCWySuOqgnd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrioeka:640-K&G#CXzTvPrhneka:630-K&G#CWzTvPrhnda:62M(I$EYAVxRtjpflb8.40-K&H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+L(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:63N)J%G#CWySuOrhnd9;52=M(I$EYBVxRtjpflc8.40-K*H!DXzTvQsioeka:63N)J%F#CWySuOqgnd9;51+M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:640-K&G#ySuPrhnd9;53N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrhoeka:63N)J&G#CWySuOrhnd9;53N%FZBVxRuOqgmc8;51+L*H!EYAUwQsioflb730-K&G#CWyTvPrhnda:63N)K&G#CWySuPrhnd9;63N)J%G#CWySuOrhnd9;52=M(I$EYAVxRtjpflc8.40-&G#CWzTvPrhndka:63N-K&G#CWyTvPrhnd9:63N)J&G#CWySuPrhnd9;53N)J%GVxSuOqgmc8;51+L*H$EYAUwQsipflb730-K&G#CWzTvPrhnda:63N-K&G#CWySvPrhnd9;63N)J&GBVySuOqgmc9;51+L*I$EYAUwQsjpflb730-K&G#CXzTvPrhndka:630-K&G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;6+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhneka:630-K&G#CWzTvPrhndka:63N-K&G#CWySvPrhnd9:62=M(I$FZBVxRtjqgmc8.41+L*H!XzTvPrhoeka:640-K&G#CXzTvPrhneka:630-K&G#CWyTvPrhnda:63N)K&G#CWySvPrhnd;51+L(I$EYAUwRtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CXzTvPrhndka:63N-K&G#CWyTvPrhnd9:6+M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:640-K&G#DXzTvPrhneka:630-K&CWySuOqhnd9;51=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb7.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhndka:2=M(I$EYBVxRtjpflc8.40-L*H!DXzTvQsioeka:73N)J%F#CWySuOqgnd9;51=M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:63N)J%FZBWySuOqgmd9;51+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhoeka:63N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflc8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgnd9;51+M(I$EYwQsioflb73N-K&G#CWySvPrhnd9:63N)J&G#CWySuPrhnd9;53N)J%F#CWSuOqgmc8;51+L*H!EYAUwQsipflb730-K&G#CWzTvPrhnda:63N)K&G#CWySvPrhnd9;63N)J&G#CWySuOrhnd9;52=M(I$EYBVxRtjpflc8.40-L*H!DXzTvQsioeka:63N)J%F#CWySuOqgnd9;51=M(I$EYAUxRtjpflb8.40-G#CWyTvPrhnd9:63N)K&G#CWySuPrhnd9;51+L(I$EYAUwQtjpflb740-K&G#DXzTvPrhneka:630-K&G#CWzTvPrhnda:63N-K&G#CWySvPrhnd9:1+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrhoeka:640-K&G#CXzTvPrhneka:630-K&G#CWyTvPrhd9;51=M(I$EYAUxRtjpflb8.40-K&H!DXzTvPrioeka:63N)J%FZBWySuOqgmd9;51+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CWzTvPrhndka:63N)J%G#CWySuOqhnd9;52=M(I$EYAVxRtjpflb8.40-K*H!DXzTvPsioeka:63N)J%FZCWySuOqgmd9;51+M(I$EYAUwRtjpflb8.40-K&G!DXzTvPrhoeka:63N)J&G#CWySuOrhnd9;53N)J%F#CWySuOqhnd9;51=M(I$EYAUxRtjplb730-K&G#CWyTvPrhnda:63N)K&G#CWuOqgmd9;51+L(I$EYAUwQtjpflb7.40-K&G#DXzTvPrhoeka:630-K&G#CWzTvPrhndka:62=M(J%FZBVxRuOqgmc8.51+L*H!DYAUwQsioflb73N)J%FZ
展开阅读全文