05-模式识别-模式选择(精品)

上传人:沈*** 文档编号:250111076 上传时间:2024-11-01 格式:PPT 页数:47 大小:767KB
返回 下载 相关 举报
05-模式识别-模式选择(精品)_第1页
第1页 / 共47页
05-模式识别-模式选择(精品)_第2页
第2页 / 共47页
05-模式识别-模式选择(精品)_第3页
第3页 / 共47页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,模式识别,Pattern Recognition,Chapter 5,FEATURE SELECTION,01 十一月 2024,1,The goals:,Select the“optimum”number,l,of features,Select the“best”,l,features,Large,l,has a three-fold disadvantage:,High computational demands,Low generalization performance,Poor error estimates,FEATURE SELECTION,2,Given,N,l,must be,large enough,to learn,what makes classes,different,what makes patterns in the same class,similar,l,must be,small enough,not,to learn what makes patterns of the same class,different,In practice,has been reported to be a sensible choice for a number of cases,Once,l,has been decided,choose the,l,most informative features,Best:,Large,between class distance,Small,within class variance,3,4,The basic philosophy,(,基本思路,),Discard individual features with,poor,information content,(,丢弃信息贫乏的单个特征,),The remaining information rich features are examined,jointly,as vectors,(,剩余富信息特征作为向量联合考察,),Feature Selection based on statistical Hypothesis Testing,(,统计假设检验,),The Goal:,对每一单个特征,观察属于不同类是否因特征数值的大小起了重要的作用。,.,That is,answer,:The values differ significantly,(,特征可分,),:The values do not differ significantly,(,特征不可分,),If they do not differ significantly reject feature from subsequent stages.,Hypothesis Testing Basics,(,假设检验,),5,The steps:,N,measurementsare known,Define a function of them,test statistic,so that is easily parameterized in terms of,.,Let,D,be an interval,where,q,has a,high probability to lie under,H,0,i.e.,p,q,(,q,0,),Let,D,be the complement,(,补集,),of,D,D,Acceptance,Interval,D,Critical Interval,If,q,resulting from,lies in,D,we accept,H,0,otherwise we reject it.,6,Probability of an error,is,preselected,and it is known as the,significance level,(,显著水平,).,1-,7,Application:The known variance case:,Let,x,be a random variable and the experimental samples,are assumed mutually,independent,.Also let,Compute the sample mean,This is also a random variable with mean value,That is,it is an,Unbiased Estimator,8,The variance,Due to independence,That is,it is,Asymptotically Efficient,(,渐进有效,),Hypothesis test,Test Statistic:Define the variable,9,Central limit theorem,(,中心极限定理,),under,H,0,Thus,under,H,0,10,The decision,steps,Compute,q,from,x,i,i=,1,2,N,Choose significance level,(,置信水平,),Compute from,N,(0,1),tables,D,=-,x,x,An example:,A random variable,x,has variance,2,=,(0.23),2,.,=,16,measurements are obtained giving,.,The significance level is,=,0.05,.,Test the hypothesis,1-,11,Since,2,is known,is,N,(0,1),.,From tables,we obtain the values with acceptance intervals,-x,x,for normal,N,(0,1),Thus,1-,0.8,0.85,0.9,0.95,0.98,0.99,0.998,0.999,x,1.28,1.44,1.64,1.96,2.32,2.57,3.09,3.29,12,Since,lies,within the above,acceptance,interval,we accept,H,0,i.e.,The interval 1.237,1.463 is also known as confidence interval,(,置信区间,),at the,1,-,=,0.95,level.,We say that:There is no,evidence,at the 5%level that the mean value is not equal to,(,期望值以,5%,的不显著程度不等于,u),13,The Unknown Variance Case,Estimate the variance.The estimate,is,unbiased,i.e.,Define the test statistic,14,This is no longer Gaussian.If,x,is Gaussian,then,q,follows a,t-distribution,(t,-,分布,),with,N,-1 degrees of freedom,An example:,15,Table of acceptance intervals for t-distribution,Degrees of Freedom,1-,0.9,0.95,0.975,0.99,12,1.78,2.18,2.56,3.05,13,1.77,2.16,2.53,3.01,14,1.76,2.15,2.51,2.98,15,1.75,2.13,2.49,2.95,16,1.75,2.12,2.47,2.92,17,1.74,2.11,2.46,2.90,18,1.73,2.10,2.44,2.88,16,Application in Feature Selection,The goal here is to test against,zero,the,difference,1,-,2,of the respective means in,1,2,of a single feature.,Let,x,i,i=,1,N,the values of a feature in,1,Let,y,i,i=,1,N,the values,of the same,feature in,2,Assume in both classes,(unknown or not),The test becomes,17,Define,z=,x-y,Obviously,E,z,=,1,-,2,Define the average,Known Variance Case,:Define,This is,N,(0,1),and one follows the procedure as before.,18,Unknown Variance Case:,Define the test statistic,q,is t-distribution with,2,N-,2,degrees of freedom,Then apply appropriate tables as before.,Example:,The values of a feature in two classes are:,1,:,3.5,3.7,3.9,4.1,3.4,3.5,4.1,3.8,3.6,3.7,2,:,3.2,3.6,3.1,3.4,3.0,3.4,2.8,3.1,3.3,3.6,Test if the mean values in the two classes differ significantly,at the significance level,=,0.05,19,We have,For,N=,10,From the table of the t-distribution with,2,N-,2,=18 degrees of freedom and,=,0.05,we obtain,D=,-,2.10,2.10,and since,q=,4.25,is outside,D,H,1,is accepted and,the feature is selected.,20,Class Separability Measures,(,类可分性测量,p.113),至目前为止我们只强调了单个的独立特征,这样做就无法记及特征之间的互相关性,比如,两个特征都是富信息的,但是由于关联性的存在,我们没有必要两个特征都被关注。为了研究可能存在的相关性,我们必须把多个特征作为向量的元素联合地,(,综合,),考察。,To this end:,Discard poor in information features,by means of a statistical test,由统计检验丢弃贫信息特征,.,Choose the maximum number,of features to be used.This is dictated,(,规定,),by the specific problem(e.g.,the number,N,of available training patterns and the type of the classifier to be adopted).,21,Combine remaining features to sear
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 管理文书 > 施工组织


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!