Chapter 13案例探讨

上传人:痛*** 文档编号:252366020 上传时间:2024-11-15 格式:PPT 页数:25 大小:960.50KB
返回 下载 相关 举报
Chapter 13案例探讨_第1页
第1页 / 共25页
Chapter 13案例探讨_第2页
第2页 / 共25页
Chapter 13案例探讨_第3页
第3页 / 共25页
点击查看更多>>
资源描述
,按一下以編輯母片標題樣式,按一下以編輯母片文字樣式,第二層,第三層,第四層,第五層,*,1,http:/ 四个数据表,数据表名称,原始资料总笔数,字段个数,Book,364299,22,Department,323,9,Reader,37318,31,History,(,借阅历史资料,),1132648,3,6,内容,简介,原始数据源,资料前置处理,资料仓储设计,资料探勘结果,总结,7,第一阶段数据前置处理,(1),原始数据利用数据转换汇入,SQL Server,将,Reader,与,Department,数据表合成新的,Reader,数据表。,在,Reader,数据表内:,增加,college,字段,建立:系所,学院 概念阶层。,依学号分类,新增,grade,字段,记录用户年级,区分为,freshman,、,sophomore,、,junior,、,senior,、,postgraduate,、,candidate for PhD,、,teacher,等七个类别。,只保留,rno(,读者,id),、,dept_code,、,dept_name,、,college_name,、,grade,字段,其余字段全部删除。,8,第一阶段数据前置处理,(2),在,Book,数据表内的:,根据中西文图书分类,在索书号上新增,sub_class,及,class,属性,建立书籍种类的概念阶层。,将,language,字段重新分类,保留最多的中文、英文、日文三类,将其余语言归类为,other,。,建立,publsih_interval,字段,出版年以五年为一区间,作为出版年代的概念阶层之用。,仅留下,marc_id(,书籍,id),、,title,、,author,、,publisher,、,publish_year,、,language,、,subject,、,marc_class,、,class,、,sub_class,、,publish_interval,等字段。,9,第一阶段数据前置处理,(3),利用中西文图书分类检表,将书分为,4,大类,大类再细分成子项目。,10,第一阶段数据前置处理,(4),在,History,数据表内:,将借阅日期拆成三个字段:借阅年、借阅月、借阅日,作为将来时间的概念阶层。,加入,amount,字段,代表借书的本数,一般都为,1,,作为事实数据表的量值。,仅留下,marc_id,、,rno,、,borrow_year,、,borrow_month,、,borrow_date,、,amount,字段,其中,amount,字段为量值。,11,第二阶段数据前置处理,(1),在,Reader,数据表,分析对象为正常学制学生,删除外校人士、行政人员、在职专班、转系、大五、大六、系所空白者、身分无法辨认者。,在,Book,数据表内,删除索书号不完全者;删除期刊数据及校内论文、不能外借之书籍(如当期杂志)、视听资料(如,CD,、,LD,、,tape,)等。,在,History,数据表内,删除索书号不完整之纪录;删除,rno(user id),无法在整理过的,Reader,数据表找到之纪录;删除索书号无法在整理过的,Book,数据表找到之纪录。,12,第二阶段数据前置处理,(2),资料整理前后数据表内容变化比较,数据表,名称,原始资料总笔数,资料整理后的总笔数,原始字段个数,整理过后字段个数,Book,364299,75214,22,11,Reader,37318,8587,31,5,History,1132648,612075,3,6,13,内容,简介,原始数据源,资料前置处理,资料仓储设计,资料探勘结果,总结,14,数据仓储设计,(1),事实数据表:,History,为事实数据表,,amount,量值。,维度:,Reader,、,Book,与,Time,三个维度。,15,数据仓储设计,(2),在,Reader,维度数据表内找到两种概念阶层:,年级:,rno,grade,学院系所:,rno,dept_name,college_name,在,Book,维度数据表内找到三种概念阶层:,语言:,title,language,主题分类:,title,sub_class,class,出版年:,publish_year,publish_interval,在,Time,维度找到一种概念阶层:,借阅日期:,borrow_date,borrow_month,borrow_year,16,数据仓储设计,(3),星状式架构的资料仓储,17,内容,简介,原始数据源,资料前置处理,资料仓储设计,资料探勘结果,总结,18,判定树资料探勘分析,预,测,项,目,书籍、学生、语言类别,百 分 比,(,预测借书类别,),人文社会学院博,士班学生,Art,13.56%,General,1.69%,Geography/History,1.69%,Language/Literature,1.69%,Natural Sciences,59.32%,Philosophy/Psychology,6.78%,Religion,1.69%,Social Sciences,11.86%,Missing,1.69%,(,预测借阅读者,的身分,),心理哲学方面,的英文书,Candidate for PhD,11.08%,Freshman,7.06%,Junior,16.67%,Postgraduate,34.58%,Senior,15.59%,Sophomore,14.25%,Teacher,0.75%,Missing,0.03%,(,预测借阅书籍,的语言,),语言文学方面,被借的书籍,Chinese,66.58%,English,33.32%,Japan,0.09%,Others,0.01%,Missing,0.00%,19,分群资料探勘分析,(1),群组一:不同时间借书比例,Cluster 1,总计,3064.04,100.00%,2001/Dec,242.44,7.91%,2001/Apr,236.85,7.73%,2002/Apr,225.34,7.35%,2002/May,223.59,7.30%,2002/Mar,220.51,7.20%,2001/Nov,218.38,7.13%,2001/Oct,198.85,6.49%,2002/Jan,196.18,6.40%,2001/Mar,190.56,6.22%,2001/May,188.30,6.15%,2002/Feb,173.92,5.68%,2001/Feb,169.10,5.52%,2001/Jun,146.56,4.78%,2001/Sep,145.13,4.74%,2001/Jan,118.29,3.86%,2001/Aug,90.89,2.97%,2001/Jul,79.15,2.58%,Missing,0,0.00%,群组二:不同时间借书比例,Cluster 2,总计,2951.87,100.00%,2002/May,318.21,10.78%,2001/Dec,225.62,7.64%,2001/Jan,221.28,7.50%,2002/Apr,203.85,6.91%,2001/Nov,197.87,6.70%,2001/Feb,190.02,6.44%,2002/Jan,184.67,6.26%,2002/Mar,180.3,6.11%,2001/May,179.82,6.09%,2001/Oct,179.09,6.07%,2001/Mar,174.69,5.92%,2001/Apr,154.05,5.22%,2001/Jun,150.35,5.09%,2002/Feb,141.99,4.81%,2001/Sep,127.51,4.32%,2001/Jul,69.08,2.34%,2001/Aug,53.46,1.81%,Missing,0,0.00%,20,分群资料探勘分析,(2),群组一:不同年级、类别借书比例,Cluster 1,总计,3064.04,100.00%,Postgraduate,1347.75,43.99%,Candidate for PhD,427.37,13.95%,Sophomore,375.96,12.27%,Junior,318.11,10.38%,Freshman,300.59,9.81%,Senior,287.55,9.38%,Teacher,6.72,0.22%,Missing,0,0.00%,群组二:不同年级、类别借书比例,Cluster 2,总计,2951.87,100.00%,Postgraduate,1322.56,44.80%,Freshman,489.12,16.57%,Candidate for PhD,427.77,14.49%,Sophomore,339.09,11.49%,Junior,217.24,7.36%,Senior,139.52,4.73%,Teacher,16.57,0.56%,Missing,0,0.00%,21,分群资料探勘分析,(3),群组一:不同类别书籍被借比例,Cluster 1,总计,3064.04,100.00%,Natural Sciences,1611.51,52.59%,Language/Literature,411.55,13.43%,Social Sciences,352.23,11.50%,General,240.92,7.86%,Art,178.23,5.82%,Geography/History,142.67,4.66%,Philosophy/Psychology,108.16,3.53%,Religion,18.76,0.61%,Missing,0,0.00%,群组二:不同类别书籍被借比例,Cluster 2,总计,2951.87,100.00%,Natural Sciences,1864.76,63.17%,Language/Literature,355.02,12.03%,Social Sciences,275.09,9.32%,General,155.55,5.27%,Geography/History,119.21,4.04%,Art,104.28,3.53%,Philosophy/Psychology,61.62,2.09%,Religion,16.34,0.55%,Missing,0,0.00%,22,关连法则资料探勘分析,(1),项次,学,生,类,别,Imply,书籍类别,支持度,(%),信赖度,(%),1,College=Electrical Engineering and Computer Science AND,People=Candidate for PhD.,Natural,Science,12.2,90.0,2,College=Electrical Engineering and Computer Science AND,People=Postgraduate,Natural,Science,23.8,96.2,3,College=Engineering AND,People=Postgraduate,Natural,Science,12.7,96.2,4,College=Science AND,People=Postgraduate,Natural,Science,10.8,85.7,5,College=Electrical Engineering AND Computer Science,Natural,Science,40.5,92.7,6,College=Engineering,Natural,Science,22.5,93.3,7,College=Science,Natural,Science,21.2,78.9,8,People=Candidate for PhD.,Natural,Science,26.2,91.0,9,People=Po
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 管理文书 > 施工组织


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!