COP5992–DATAMININGTERMPROJECTRANDOM…cop5992–数据挖掘项目随机…

上传人:e****s 文档编号:252444762 上传时间:2024-11-15 格式:PPT 页数:13 大小:72KB
返回 下载 相关 举报
COP5992–DATAMININGTERMPROJECTRANDOM…cop5992–数据挖掘项目随机…_第1页
第1页 / 共13页
COP5992–DATAMININGTERMPROJECTRANDOM…cop5992–数据挖掘项目随机…_第2页
第2页 / 共13页
COP5992–DATAMININGTERMPROJECTRANDOM…cop5992–数据挖掘项目随机…_第3页
第3页 / 共13页
点击查看更多>>
资源描述
,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,COP5992 DATA MINING TERM PROJECTRANDOM SUBSPACE METHOD+CO-TRAININGbySELIM KALAYCI,RANDOM SUBSPACE METHOD(RSM),Proposed by Ho,“The Random Subspace for Constructing Decision Forests,1998,Another combining technique for weak classifiers like Bagging,Boosting.,RSM ALGORITHM,1.Repeat for,b,=1,2,.,B,:,(a)Select an,r,-dimensional random subspace,X,from the original,p,-dimensional feature space,X,.,2.Combine classifiers,C,b,(,x,),b,=1,2,.,B,by simple majority voting to a final decision rule,MOTIVATION FOR RSM,Redundancy in Data Feature Space,Completely redundant feature set,Redundancy is spread over many features,Weak classifiers that have critical training sample sizes,RSM PERFORMANCE ISSUES,RSM Performance depends on:,Training sample size,The choice of a base classifier,The choice of combining rule(simple majority vs.weighted),The degree of redundancy of the dataset,The number of features chosen,DECISION FORESTS(by Ho),A combination of trees instead of a single tree,Assumption:Dataset has some redundant features,Works efficiently with any decision tree algorithm and data splitting method,Ideally,look for best individual trees with lowest tree similarity,UNLABELED DATA,Small number of labeled documents,Large pool of unlabeled documents,How to classify unlabeled documents accurately?,EXPECTATION-MAXIMIZATION(E-M),CO-TRAINING,Blum and Mitchel,“Combining Labeled and Unlabeled Data with Co-Training,1998.,Requirements:,Two sufficiently strong feature sets,Conditionally independent,CO-TRAINING,APPLICATION OF CO-TRAINING TO A SINGLE FEATURE SET,Algorithm:,Obtain a small set,L,of labeled examples,Obtain a large set,U,of unlabeled examples,Obtain two sets,F,1,and,F,2,of features that are sufficiently redundant,While,U,is not empty do:,Learn classifier C,1,from L based on F,1,Learn classifier C,2,from L based on F,2,For each classifier C,i,do:,C,i,labels examples from U based on F,i,C,i,chooses the most confidently predicted examples E from U,E is removed from U and added(with their given labels)to L,End loop,THINGS TO DO,How can we measure redundancy and use it efficiently?,Can we improve Co-training?,How can we apply RSM efficiently to:,Supervised learning,Semi-supervised learning,Unsupervised learning,QUESTIONS,?,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 商业计划


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!