HierarchicalClassificationofDocumentswithErrorControl

上传人:xx****x 文档编号:242868879 上传时间:2024-09-10 格式:PPT 页数:32 大小:284.50KB
返回 下载 相关 举报
HierarchicalClassificationofDocumentswithErrorControl_第1页
第1页 / 共32页
HierarchicalClassificationofDocumentswithErrorControl_第2页
第2页 / 共32页
HierarchicalClassificationofDocumentswithErrorControl_第3页
第3页 / 共32页
点击查看更多>>
资源描述
,按一下以編輯母片標題樣式,按一下以編輯母片,第二層,第三層,第四層,第五層,*,*,Hierarchical Classification of Documents with Error Control,Chun-Hung Cheng, Jian Tang, Ada Wai-chee Fu, Irwin King,This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation,In Slide Show, click on the right mouse button,Select “Meeting Minder”,Select the “Action Items” tab,Type in action items as they come up,Click OK to dismiss this box,This will automatically create an Action Item slide at the end of your presentation with your points entered.,1,Overview,Abstract,Problem Description,Document Classification Model,Error Control Schemes,Recovery oriented scheme,Error masking scheme,Experiments,Conclusion,2,Abstract,Traditional document classification (flat classification) involves only a single classifier,Single classifier takes care of everything,Slow and high overhead,3,Abstract,Hierarchical document classification,Class hierarchy,Use one classifier at each internal node,4,Abstract,Advantage,Better performance,Disadvantage,Wrong result if misclassified in any node,5,Abstract,Introduce error control mechanism,Approach 1 (recovery oriented),Detect and correct misclassification,Approach 2 (error masking),Mask errors by using multiple versions of classifiers,6,Problem Description,class | doc_id, | ,Class Taxonomy,Training Documents,Class-doc Relation,Training System,Statistics,Feature Terms,7,Problem Description,Classification,System,Statistics,Feature Terms,Target,Class,Incoming Documents,8,Problem Description,Objective: Achieve,Higher accuracy,Fast performance,Our proposed algorithms provide a good trade-off between accuracy and performance,9,Document Classification Model,Formally, we use a model from Chakrabarti et al. 1997,Based on naive Bayesian network,For simplicity, we study a single node classifier.,c,c,1,c,2,c,n,10,z,i,d,number of occurrence of term,i,in the incoming document,d,P,j, c, probability that a word in class,c,is,j,(estimated using the training data),Probability that an incoming document,d,belongs to,c,is,11,Feature Selection,Previous formula involves all the terms,Feature selection reduces cost by using only the terms with good discriminating power,Use the training sets to identify the feature terms,12,Fishers Index,Fishers Index indicates the discriminating power of a term,Good discriminating power: large interclass distance, small intraclass distance,c,1,c,2,w,(,t,),Interclass distance,Intraclass distance,13,Document Classification Model,Consider only feature terms in the classification function,p,(,c,i,|,c,d,),Pick the largest probability among all,c,i,Use one classifier in each internal node,c,c,1,c,2,c,n,14,Recovery Oriented Scheme,Database system,Failure in DBMS,Restart from a consistent state,Document classification,Error detected,Restart from a correct class (High Confidence Ancestor, or HCA),15,Recovery Oriented Scheme,In practice,Rollback is slow,Identify wrong paths and avoid them,To identify wrong paths,Define closeness indicator (CI),On wrong path, when CI falls below a threshold,16,Recovery Oriented Scheme,Define distance of HCA,and current node = 2,Wrong path,HCA,17,Recovery Oriented Scheme,Wrong path,HCA,HCA,Define distance of HCA,and current node = 2,18,Error Masking Scheme,Software Fault Tolerance,Run multiple versions of software,Majority voting,Document Classification,Run classifiers of different designs,Majority voting,19,O-Classifier,Traditional classifier,20,N-classifier,Skip some intermediate levels,21,Error Masking Scheme,Run three classifiers in parallel,O-classifier,N-classifier,O-classifier using new feature length,This selection minimizes the time wasted on waiting the slowest classifiers,22,Experiments,Data Sets,US Patents,Preclassified,Rich text content,Highly hierarchical,3 Sets Collected,3 levels/large no of docs,4 levels/large no of docs,7 levels/small no of docs,23,Experiments,Algorithm compared,Simple hierarchical,TAPER,Flat,Recovery oriented,Error masking,Generally,flat is the slowest and the most accurate,simple hierarchical is the fastest and the least accurate,24,Accuracy: 3 levels/large,25,Accuracy: 4 levels/large,26,Accuracy: 7 levels/small,27,Performance: 3 levels/large,28,Performance: 4 levels/large,29,Performance: 7 levels/small,30,Conclusion,Real-life application,Large taxonomy,Flat classification is too slow,Our algorithm is faster than flat classification at as low as 4 levels,Performance gain widens as the number of levels increases,A good trade-off between accuracy and performance for most applications,31,Thank You,The End,32,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 大学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!