资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,4/25/2010,#,2009,VMware,Inc.,All,rights,reserved,Serengeti,-,虚,拟,化你的大数据,应,用,蔺,永,华,Vmware,Inc.,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,TodaysBigDataSystem,:,:,ETL,UnstructuredData(HDFS),RealTime,Structured,Database,BigSQL,Data,Parallel,Batch,Processing,RealTime,Streams,Real-Time,Processing,(s4,storm,),),Analytics,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,ChallengesToUseHadoopinphysicalinfrastructure,Deployment,Difficulttodeploy,costseveralpeopleforseveraldaysevenmonths,Difficulttotuneclusterperformance,LowEfficiency,Hadoopclustersaretypicallynot100,%,%utilizedacrossallhardwareresources,.,.,Difficulttoshareresourcessafelybetweendifferentworkload,SinglePointofFailure,SinglepointoffailureforNameNodeandJobtracker,NoHAforHive,HCatalog,etc,.,.,WhyVirtualizeHadoop,?,?-GetyourHadoopclusterinminutes,1/1000humanefforts,LeastHadoopoperationknowledge,Fullyautomatedprocess,10minutestogeta,Hadoop,/,/HBaseclusterfrom,scratch,Serverpreparation,OSinstallation,AutomatebySerengetion,vSpherewithbestpractice,NetworkConfiguration,HadoopInstallationand,Configuration,Manualprocess,costdays,WhyVirtualizeHadoop,?,?-Consolidatesprawlingclusters,Clustersshare,serverswith,strongisolation,SingleHardwareInfrastructure,Unifiedoperations,Optimize,SharedResources=higherutilization,Elasticresources=fasteron-demandaccess,HadoopDev,Hadoop,Prod,HBase,ClusterSprawling,Singlepurposeclustersforvarious,businessapplicationsleadtocluster,sprawl,.,.,ClusterConsolidation,Simplify,Finance,Hadoop,VirtualizationPlatform,Hadoop,Dev,Hadoop,Prod,HBase,.,Portal,Hadoop,Portal,Hadoop,30%CAPEXDown,50%,+,+resourcesaresitting,idlewhilehighpriorityjobis,burningupitscluster.,Utilizeallresourcesfrom,poolondemand,.,.,Dynamicelastic,scalingonshared,resourcepool,WhyVirtualizeHadoop,?,?,Utilizeallyourresourcestosolvethepriorityproblem,3Xfastertogetanalyticresults,vSphereHighAvailability(HA,),)-protectionagainstunplanneddowntime,Overview,ProtectionagainsthostandVMfailures,Automaticfailuredetection(host,guestOS),Automaticvirtualmachinerestartinminutes,onanyavailablehostincluster,OSandapplication-independent,doesnotrequirecomplexconfiguration,changes,(Coordination),Zookeepr,ManagementServer,HighAvailabilityfortheHadoopStack,(HadoopDistributedFileSystem,),),HBase,(Key-Valuestore,),),HDFS,MapReduce,(JobScheduling/ExecutionSystem,),),Pig,(Data,Flow),Hive,BIReporting,ETLTools,RDBMS,Jobtracker,Namenode,(SQL),Hive,MetaDB,HCatalog,HcatalogMDB,Server,X,X,HAHA,App,OS,AppApp,OSOS,App,OS,App,OS,App,OS,App,OS,VMwareESX,X,VMwareESX,Zerodowntime,zerodataloss,failoverforallvirtualmachinesin,caseofhardwarefailures,IntegratedwithVMwareHA/DRS,Nocomplexclusteringor,specializedhardwarerequired,Singlecommonmechanismforall,applicationsandoperating,FT,vSphereFaultToleranceprovidescontinuousprotection,Overview,SingleidenticalVMsrunningin,locksteponseparatehosts,systems,ZerodowntimeforNameNode,JobTrackerandothercomponentsinHadoopclusters,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,Easyandrapiddeploymentandmanagement,OpensourceprojectlaunchedinJune2012,0.8isreleasedatApr,.,.,andwillrelease0,.,.9atJun,.,.,ToolkitthatleveragevirtualizationtosimplifyHadoopdeployment,andoperations,Deployaclusterin10Minutesfullyautomated,CustomizeHadoopandHBasecluster,Automatedclusteroperation,Comewitheco,-,-systemcomponents,SupportallpopularHadoopDistributions,Serengeti,Demo:10minutestoaHadoopclusterwithSerengeti,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,Commonquestionsaboutvirtualization,LocalDisk,Canlocaldiskbeusedinvirtualizationenvironment?,FlexibilityandScalability,Howtoflexiblescheduleresourcesbetweenclustersanddifferent,applicationsasmentionedabove?,Datastability,Invirtualenvironment,howcanwedistributedataacrosshostandrack?,Datalocality,Hadoopwillschedulecomputetasksnearbythedata,toreducenetwork,IOfordataR/W,.,.Canvirtualenvironmentgetthesameresult,?,?,Performance,Howabouttheperformanceinvirtualenvironment?,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,CanIuselocaldiskeasily?,OtherVM,OtherVM,OtherVM,OtherVM,OtherVM,OtherVM,OtherVM,OtherVM,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,Hadoop,SerengetiExtendVirtualStorageArchitecturetoIncludeLocalDisk,SharedStorage:SANorNAS,Easytoprovision,Automatedclusterrebalancing,HybridStorage,SANforbootimages,other,workloads,LocaldiskforHadoop&HDFS,Host,Host,Host,Host,Host,Host,Howtoflexiblescalein,/,/scaleout,Howtoflexiblescheduleresourcesbetweenclustersand,differentapplications,?,?,-,Compute,Current,Hadoop,:,:,T1,T2,VM,VM,VM,VM,Combined,Storage/Com,pute,HadoopinVM,-*VMlifecycle,determined,byDatanode,-*Limitedelasticity,VM,Storage,SeparateStorage,VM,Storage,SeparateComputeClusters,-*Separatecompute-,fromdata,-*Removeelasticconstrain,-byDatanode,-*Elasticcompute,-*Raiseutilization,-,*Separatevirtualcompute,*Computeclusterpertenant,*StrongerVM-gradesecurity,andresourceisolation,EvolutionofHadooponVMsData/Computeseparation,SlaveNode,SerengetiNodeScaleOut/ScaleIn,NameNode,Host,D,Host,JobTracker,C,C,C,C,D,Host,C,C,C,C,D,Host,C,C,C,C,D,Host,C,C,C,C,SerengetiBallooningEnhancementforJavaApplication,JVM,GuestOS,Host,JVM,GuestOS,Host,GuestOS,JVM,Howtokeepdatastability,?,?,Howtoaccessdatalocallyifdatanodeandcomputenode,arelocatedindifferentVM?,Datanodeandtasktrackercombinedcluster,DataComputeseparatedcluster,master,Host,worker,Host,worker,Host,master,Host,Datanode,Host,Tasktracker,Datanode,Host,Tasktracker,Tasktracker,Tasktracker,Datanode,Host,Computeonlycluster1,Computeonlycluster2,HDFScluster,ComputeOnlycluster,Rack1,Rack2,Rack1,DistributedandData/ComputeAssociatedVMPlacement,Rack2,Rack1,Jobtracker,Jobtracker,Namenode,Host,Rack2,Tasktracker,Tasktracker,Datanode,Host,HadoopTopologyChanges,forVirtualization,HadoopTopologyAwarenessSerengetiHVE,/,D1,D2,R1,R2,N1,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10,H11,H12,R3,R4,3,/,D1,D2,R1,R2,H1,H2,H3,H4,H5,H6,H7,H8,H9,H10,H11,H12,R3,R4,2,3,N2,N3,N4,N5,N6,N7,N8,11,23,2,11,2,3,4,HADOOP,-,-8468(UmbrellaJIRA,),),HADOOP,-,-8469,HDFS-3495,HDFS-3498,Hadoop,NetworkTopologyExtension,HadoopVirtualizationExtensionsforTopology,HVE,TaskSchedulingPolicyExtension,BalancerPolicyExtension,ReplicaChoosingPolicyExtension,ReplicaPlacementPolicyExtension,ReplicaRemovalPolicyExtension,HDFS,MapReduce,HadoopCommon,MAPREDUCE,-,-4310,MAPREDUCE,-,-4309,HADOOP,-,-8470,HADOOP,-,-8472,Istheresignificantperformancedegradationinvirtualization,environment?,Isthereanyperformancedata,?,?,VirtualizedHadoopPerformance,NativeversusVirtualPlatforms,32hosts,16disks/host,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,RestAPI,SpringBatch,Update,MetaDB,step,VM,Placement,calculation,VM,Provision,step,Software,Mgmt,step,UIClient,FlexUI,Serengetiarchitecturediagram,CLIClient,SpringShell,Serengeti,Web,Service,Hibernate,/,/,DAO,vPostgres,VCadapter,Ironfan,service,ThriftService,Progress,Ironfan,report,Chef,server,RestAPI,Cookbook,VHM,step,RabbitMQ,VMruntime,Manager,Host,Host,Host,Host,Host,VirtualizationPlatform,Hadoop,Node,ChefClient,HAkit,Hadoop,Node,Hadoop,Node,Package,repository,vCenter,CustomizingyourHadoop,/,/HBaseclusterwithSerengeti,Choiceofdistros,Storageconfiguration,ChoiceofsharedstorageorLocaldisk,Resourceconfiguration,Highavailabilityoption,#ofnodes,distro:,apache,groups:,name,:,:master,roles,:,hadoop_namenode,hadoop_jobtracker,”,”,storage,:,:,type,:,:SHARED,sizeGB:20,instance,_,_type,:,:MEDIUM,instance,_,_num:1,ha,:true,name,:worker,roles,:,hadoop_datanode,hadoop_tasktracker,instance,_,_type,:,:SMALL,instance,_,_num:5,ha,:false,OnecommandtoscaleoutyourclusterwithSerengeti,clusterresizename,-nodegroupworkerinstanceNum,Configure,/,/reconfigureHadoopwitheasebySerengeti,ModifyHadoopclusterconfigurationfromSerengeti,Usethe“configuration”sectionofthejsonspecfile,SpecifyHadoopattributesincore-site,.,.xml,hdfs-site,.,.xml,mapred,-,-site.xml,hadoop,-,-env.sh,log4j.properties,ApplynewHadoopconfigurationusingtheeditedspecfile,configuration,:,:,hadoop:,core-site.xml,:,:,/checkforallsettingsathttp:/,/,/hadoop.apache.org,/,/common/docs,/,/r1,.,.0.0/core,-,-default.html,hdfs-site.xml,:,:,/checkforallsettingsathttp:/,/,/hadoop.apache.org,/,/common/docs,/,/r1,.,.0.0/hdfs,-,-default.html,mapred-site,.,.xml:,/checkforallsettingsathttp:/,/,/hadoop.apache.org,/,/common/docs,/,/r1,.,.0.0/mapred-default.html,io,.,.sort.mb,:,:300,hadoop-env.sh,:,:,/HADOOP_HEAPSIZE:,/HADOOP_NAMENODE_OPTS,:,/HADOOP_DATANODE_OPTS,:,clusterconfig-namemyHadoop-specFile/home/serengeti,/,/myHadoop,.,.json,FreedomofChoiceandOpenSource,CommunityProjects,Distributions,Flexibilitytochoosefrommajordistributions,clustercreate-namemyHadoop-distroapache,Supportformultipleprojects,Openarchitecturetowelcomeindustryparticipation,ContributingHadoopVirtualizationExtensions,(,(HVE)toopen,sourcecommunity,HDFS2withNamenodeFederationandHA,DeployCDH4Hadoopcluster,NameNodeFederation,NameNodeHA,MapReducev1,HBase,Pig,Hive,andHiveServer,CDH4configurations,Scaleout,Elasticity,JobTrackerHA/FT,ActiveNamenode,StandbyNamenode,ActiveNamenode,StandbyNamenode,ZookeeperGroup,ZK,ZK,ZK,Coordinate,NamenodeGroup1,Coordinate,NamenodeGroup2,Quorum,-,-based,metadatastore,DataNodes,DatanodeDatanodeDatanodeDatanodeDatanodeDatanodeDatanodeDatanode,Blockreport,Blockreport,ProactivemonitoringandtuningwithVCOPs,ProactivelymonitoringthroughVCOPs,Gaincomprehensivevisibility,Eliminatemanualprocesseswithintelligentautomation,Proactivelymanageoperations,Agenda,Todaysbigdatasystem,Whyvirtualizehadoop,?,?,Serengetiintroduction,Commonquestionsaboutvirtualization,Serengetisolution,DeepinsightintoSerengeti,Summary,Q&A,VMWarebringsAgility,Efficiency,andElasticitytoBigData,Elasticity,Enablefullelasticity,throughseparationof,DataandCompute,ScaleIn/OutHadoop,withResource,Constrain,Agility,Deploy,configureand,monitorHadoop,clustersonthefly,Dynamicreconfiguring,ofHadooptomeet,changingbusiness,demands,Efficiency,ConsolidateHadoop,toachievehigher,utilization,Poolresourcesto,allowforincreased,performanceand,priorityjobprocessing,SerengetiResources,DownloadandtrySerengeti,projectserengeti.org,VMwareHadoopsite,vmware,.,.com/hadoop,
展开阅读全文