刘渡舟讲伤寒论录音整理 Title45

上传人:r****d 文档编号:243875192 上传时间:2024-10-01 格式:PPT 页数:48 大小:156KB
返回 下载 相关 举报
刘渡舟讲伤寒论录音整理 Title45_第1页
第1页 / 共48页
刘渡舟讲伤寒论录音整理 Title45_第2页
第2页 / 共48页
刘渡舟讲伤寒论录音整理 Title45_第3页
第3页 / 共48页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,CS 361A,*,*,CS 361A,(Advanced Data Structures and Algorithms),Lecture 20(Dec 7,2005),Data Mining:Association Rules,Rajeev Motwani,(partially based on notes by Jeff Ullman),CS 361A,1,Association Rules Overview,Market Baskets&Association Rules,Frequent item-sets,A-priori algorithm,Hash-based improvements,One-or two-pass approximations,High-correlation mining,CS 361A,2,Association Rules,Two Traditions,DM is,science,of approximating,joint distributions,Representation of process generating data,Predict PE for interesting events E,DM is,technology,for,fast counting,Can compute certain summaries quickly,Lets try to use them,Association Rules,Captures,interesting,pieces of joint distribution,Exploits fast counting technology,CS 361A,3,Market-Basket Model,Large Sets,Items,A=A,1,A,2,A,m,e.g.,products sold in supermarket,Baskets,B=B,1,B,2,B,n,small subsets of items in,A,e.g.,items bought by customer in one transaction,Support,sup(X),=number of baskets with itemset,X,Frequent Itemset Problem,Given,support threshold,s,Frequent Itemsets,Find,all frequent itemsets,CS 361A,4,Example,Items A,=milk,coke,pepsi,beer,juice.,Baskets,B,1,=m,c,bB,2,=m,p,j,B,3,=m,bB,4,=c,j,B,5,=m,p,bB,6,=m,c,b,j,B,7,=c,b,jB,8,=b,c,Support threshold,s=3,Frequent itemsets,m,c,b,j,m,b,c,b,j,c,CS 361A,5,Application 1(Retail Stores),Real market baskets,chain stores keep TBs of customer purchase info,Value?,how typical customers navigate stores,positioning tempting items,suggests“tie-in tricks e.g.,hamburger sale while raising ketchup price,High support needed,or no$s,CS 361A,6,Application 2(Information Retrieval),Scenario 1,baskets,=documents,items,=words in documents,frequent word-groups,=linked concepts.,Scenario 2,items,=sentences,baskets,=documents containing sentences,frequent sentence-groups,=possible plagiarism,CS 361A,7,Application 3(Web Search),Scenario 1,baskets,=web pages,items,=outgoing links,pages with similar references,about same topic,Scenario 2,baskets,=web pages,items,=incoming links,pages with similar in-links,mirrors,or same topic,CS 361A,8,Scale of Problem,WalMart,sells m=100,000 items,tracks n=1,000,000,000 baskets,Web,several billion pages,one new“word per page,Assumptions,m small enough for small amount of memory per item,m too large for memory per pair or k-set of items,n too large for memory per basket,Very sparse data rare for item to be in basket,CS 361A,9,Association Rules,If-then rules,about basket contents,A,1,A,2,A,k,A,j,if basket has,X=A,1,A,k,then likely to have,A,j,Confidence,probability of,A,j,given,A,1,A,k,Support,(of rule),CS 361A,10,Example,B1=m,c,b,B2=m,p,j,B3=m,b,B4=c,j,B5=m,p,b,B6=m,c,b,j,B7=c,b,jB8=b,c,Association Rule,m,b,c,Support=2,Confidence=2/4=50%,CS 361A,11,Finding Association Rules,Goal,find,all,association rules such that,support,confidence,Reduction to Frequent Itemsets Problems,Find,all frequent itemsets X,Given,X=A,1,A,k,generate,all,rules,X-A,j,A,j,Confidence=,sup(X)/sup(X-A,j,),Support=,sup(X),Observe,X-A,j,also frequent,support known,CS 361A,12,Computation Model,Data Storage,Flat Files,rather than database system,Stored on,disk,basket-by-basket,Cost Measure,number of passes,Count,disk I/O,only,Given data size,avoid random seeks and do,linear-scans,Main-Memory Bottleneck,Algorithms maintain count-tables in memory,Limitation on number of counters,Disk-swapping count-tables is disaster,CS 361A,13,Finding Frequent Pairs,Frequent 2-Sets,hard case already,focus,for now,later extend to k-sets,Nave Algorithm,Counters,all,m(m1)/2,item pairs,Single pass,scanning all baskets,Basket of size,b,increments,b(b1)/2,counters,Failure?,if memory,m(m1)/2,even for,m=100,000,CS 361A,14,Montonicity Property,Underlies all known algorithms,Monotonicity Property,Given,itemsets,Then,Contrapositive,(for 2-sets),CS 361A,15,A-Priori Algorithm,A-Priori,2-pass approach in limited memory,Pass 1,m,counters(,candidate items,in,A,),Linear scan,of baskets,b,Increment counters,for each item in,b,Mark as,frequent,f,items of count at least,s,Pass 2,f(f-1)/2,counters(,candidate pairs,of frequent items),Linear scan,of baskets,b,Increment counters,for each pair of frequent items in,b,Failure,if memory s,bit=1),Pass 2,Counter,only for,F,qualified,pairs(X,i,X,j,):,both are frequent,pair hashes to frequent bucket(bit=1),Linear scan,of baskets,b,Increment counters,for candidate qualified pairs of items in,b,CS 361A,20,Multistage PCY Algorithm,Problem,False positives from hashing,New Idea,Multiple,rounds of hashing,After Pass 1,get list of,qualified,pairs,In Pass 2,hash only,qualified,pairs,Fewer pairs hash to buckets,less,false positives,(buckets with count s,yet no pair of count s),In Pass 3,less likely to qualify infrequent pairs,Repetition,reduce memory,but more passes,Failure,memory 2,Monotonicity,itemset,X,is frequent,only if,X X,j,is frequent for all,X,j,Idea,Stage k,finds all frequent k-sets,Stage 1,gets all frequent item
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 市场营销


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!