8-1数据仓库与数据挖掘

上传人:633****35 文档编号:243983116 上传时间:2024-10-01 格式:PPTX 页数:38 大小:1.52MB
返回 下载 相关 举报
8-1数据仓库与数据挖掘_第1页
第1页 / 共38页
8-1数据仓库与数据挖掘_第2页
第2页 / 共38页
8-1数据仓库与数据挖掘_第3页
第3页 / 共38页
点击查看更多>>
资源描述
Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Silberschatz,Korth and Sudarshan,20.,#,Click to edit Master title style,Database System Concepts-6,th,Edition,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Chapter 20:Data Analysis,Chapter 20:Data Analysis,DecisionSupport Systems,Data Warehousing,Data Mining,Classification,AssociationRules,Clustering,DecisionSupport Systems,Decision-support systemsareusedtomake business decisions,oftenbasedondatacollectedbyon,-,-linetransaction-processing systems,.,.,Examplesofbusinessdecisions,:,:,What items to stock?,What insurancepremium to change?,Towhom to sendadvertisements,?,?,Examplesofdata usedfor makingdecisions,Retailsalestransactiondetails,Customerprofiles,(,(income,age,gender,etc.),Decision-Support Systems,:,:Overview,Data analysistasksaresimplifiedbyspecializedtoolsandSQL extensions,Example tasks,Foreachproduct category andeach region,whatwere thetotalsalesinthelastquarter andhowdotheycompare withthe samequarterlast year,Asabove,for eachproductcategoryandeachcustomercategory,Statisticalanalysispackages,(,(e.g.,:,:S+,),)canbeinterfaced withdatabases,Statisticalanalysisisa large field,butnot coveredhere,Data miningseekstodiscoverknowledgeautomaticallyintheformofstatisticalrulesandpatternsfromlargedatabases,.,.,Adata warehousearchivesinformationgatheredfrom multiple sources,andstoresitunderaunified schema,atasinglesite.,Importantfor large businessesthatgeneratedata frommultipledivisions,possiblyatmultiplesites,Data mayalso be purchasedexternally,Data Warehousing,Data sourcesoftenstoreonlycurrent data,nothistorical data,Corporatedecisionmaking requires aunifiedview of allorganizationaldata,includinghistoricaldata,Adata warehouseisa repository,(,(archive,),)of information gathered frommultiplesources,stored under aunifiedschema,at asingle site,Greatly simplifiesquerying,permitsstudyofhistoricaltrends,Shiftsdecisionsupportqueryload awayfromtransactionprocessing systems,Data Warehousing,DesignIssues,When andhowtogather data,Sourcedriven architecture:datasourcestransmitnewinformationtowarehouse,eithercontinuously or periodically,(,(e.g,.,.,atnight),Destinationdrivenarchitecture:warehouseperiodicallyrequestsnew information fromdatasources,Keeping warehouseexactly synchronizedwith datasources,(,(e.g,.,.,usingtwo,-,-phasecommit)istooexpensive,Usually OK to haveslightlyout-of-datedataatwarehouse,Data/updatesare periodicallydownloaded formonline transaction processing,(,(OLTP,),)systems,.,.,What schematouse,Schemaintegration,More WarehouseDesignIssues,Data cleansing,E.g,.,.,correct mistakes in addresses,(,(misspellings,zipcodeerrors,),),Mergeaddress lists fromdifferent sourcesandpurgeduplicates,Howtopropagate updates,Warehouseschema maybea(materialized,),)viewofschema fromdatasources,What datatosummarize,Rawdatamaybetoo large to store on-line,Aggregatevalues(totals,/,/subtotals)oftensuffice,Queries on rawdata canoftenbetransformedbyqueryoptimizertouse aggregatevalues,WarehouseSchemas,Dimensionvalues areusually encodedusingsmallintegersand mappedtofull valuesviadimension tables,Resultantschema is calledastar schema,More complicated schemastructures,Snowflakeschema:multiple levelsofdimensiontables,Constellation:multiple facttables,Data WarehouseSchema,Data Mining,Data miningistheprocessofsemi-automaticallyanalyzing large databasestofind usefulpatterns,Predictionbasedonpast history,Predict if acredit cardapplicant poses agoodcreditrisk,basedonsomeattributes(income,jobtype,age,.,.,.)andpasthistory,Predict if apatternofphonecalling cardusageislikely to be fraudulent,Some examples of predictionmechanisms:,Classification,Givena newitem whose class is unknown,predicttowhichclassitbelongs,Regressionformulae,Givena setofmappingsforanunknownfunction,predictthefunctionresult fora newparametervalue,Data Mining,(,(Cont.,),),DescriptivePatterns,Associations,Find books thatare often boughtby,“,“similar”customers.Ifanewsuchcustomerbuys onesuch book,suggestthe otherstoo,.,.,Associationsmay be usedasafirststep in detectingcausation,E.g,.,.,associationbetween exposure to chemical Xand cancer,Clusters,E.g,.,.,typhoid cases wereclustered in an areasurroundingacontaminatedwell,Detectionofclustersremainsimportantindetecting epidemics,ClassificationRules,Classificationruleshelp assignnewobjectstoclasses.,E.g,.,.,givena newautomobile insuranceapplicant,shouldheorshebeclassifiedaslowrisk,medium riskorhighrisk?,Classificationrulesforaboveexamplecoulduseavariety of data,suchaseducationallevel,salary,age,etc,.,.,personP,P,.,.degree=mastersandP.income,75,000,P.credit,=,=excellent,personP,P,.,.degree=bachelorsand(P.income25,000and P,.,.income75,000,),)P.credit,=,=good,Rulesarenot necessarily exact:theremaybesomemisclassifications,Classificationrulescanbeshowncompactly as adecisiontree.,DecisionTree,ConstructionofDecisionTrees,Trainingset:adatasampleinwhichthe classification is alreadyknown,.,.,Greedytopd
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 市场营销


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!