非参数估计(第1讲)

上传人:仙*** 文档编号:242855038 上传时间:2024-09-08 格式:PPT 页数:148 大小:2.27MB
返回 下载 相关 举报
非参数估计(第1讲)_第1页
第1页 / 共148页
非参数估计(第1讲)_第2页
第2页 / 共148页
非参数估计(第1讲)_第3页
第3页 / 共148页
点击查看更多>>
资源描述
,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,Dr. Ding Yuxin,Pattern Classification Chapter 4 part 2,*,Click to edit the title text format,Click to edit the outline text format,Second Outline Level,Third Outline Level,Fourth Outline Level,Fifth Outline Level,Sixth Outline Level,Seventh Outline Level,Eighth Outline Level,Ninth Outline Level,*,非参数估计,Non-Parametric Estimation,-,模式识别课程,Non-Parametric Classification,Summary,Probability density estimation,概率密度估计,Parzen windows,Parzen,窗估计,Kn-nearest-neighbor estimation,Kn-,近邻估计,The nearest-neighbor rule / knearest-neighbor rule,最近邻规则,/k,近邻规则,Non-Parametric Classification,模式分类的途径,途径,1,:,E,stimate the,class-conditional probability density function p(x|,i,),估计类条件概率密度,p(x|,i,),The posterior probability p(,i,|x) can be computed from p(x|,i,),和,p(,i,) by Bayes formula.,通过,p(x|,i,),和,P(,i,),,利用贝叶斯规则计算后验概率,p(,i,|x),,然后通过最大后验概率做出决策,Non-Parametric Classification,模式分类的途径,途径,1,:,E,stimate the,class-conditional probability density function p(x|,i,),估计类条件概率密度,p(x|,i,),方法,1a,:,Parameter estimation of probability density functions,概率密度参数估计,基于对,p(x|,i,),的含参数的描述,方法,1b,:,Non-parametric estimation of probability density functions,概率密度非参数估计,基,于对,p(x|,i,),的非,参数的描述,Non-Parametric Classification,模式分类的途径,途径,1,:,E,stimate the,class-conditional probability density function p(x|,i,),估计类条件概率密度,p(x|,i,),途径,2,:,E,stimate the posterior probability,p(,i,|x),直接估计后验概率,p(,i,|x),,不需要先估计,p(x|,i,),途径,3,:Calculate the discriminant function,直接计算判别函数,不需要估计,p(x|,i,),或者,p(,i,|x),Non-Parametric Classification,对于参数模式,All Parametric densities are unimodal (have a single local maximum), whereas many practical problems involve multi-modal densities,基本上都假定,参数密度是单峰(即,有一个单一的局部最大值),但许多实际问题都涉及到多模态密度,Nonparametric procedures can be used with arbitrary distributions and without the assumption that the forms of the underlying densities are known,非参数过程可以被用在任意分布中,并且不需要假定基本密度的形式已知,Non-Parametric Classification,非参数估计方法的分类,nonparametric procedures-,-can be,used with arbitrary distributions and without the assumption that the forms of the underlying densities are known.,非参数估计,:,直接用已知类别样本去估计总体密度分布:,Estimate the density functions p(,x,|,i,) from sample patterns, such as Parzen windows. If these estimates are satisfactory, they can be substituted for the true densities when designing the classifier.,用样本直接去估计,概率密度,p(,x,|,i,),以此来设计分类器,如,Parzen,窗估计,.,如果这些估计是满足要求的,它们可以在设计分类器时直接替代真正的概率密度函数。,Directly estimate the posterior probabilities,p(,i,|,x,),such as k-nearest-neighbor, which bypass probability estimation and go directly to decision functions,用学习样本直接估计,后验概率,p(,i,|,x,),作为分类准则来设计分类器,如,k,近邻法,它绕开了概率估计直接到判决函数,Non-parametric density estimation,10,2024/9/8,Instance-based techniques,Instance-based techniques,基于实例的技术,-histogram,-kernel-based methods (Parzen window),-nearest neighbor,直方图法 核方法最近邻法,用直方图法逼近概率密度函数,高维特征空间,运算是很难,The histogram,The simplest form of non-parametric DE is the histogram,-Divide the sample space into a number of bins and approximate the density at the center of each bin by the fraction of points in the training data that fall into the corresponding bin,The histogram requires two “parameters” to be defined: bin width and starting position of the first bin,12,2024/9/8,Histograms,Consider n-dimensional bounded attribute space,考虑一个,D,维属性空间,有,k,个点落入某个空间内,N training points,Histograms,Divide space into bins,Whats the probability that a new point will fall in a particular bin?,一个新的点怎么分类,Histograms-,Probability density,Histograms-,Probability density,Histograms-,Density estimate,Whole space has volume 1,Assume uniform density:,Classification,Choose most common class in bin.,区域里测试样本数目最多的类作为点,x,的分类结果,Justification,Learn a separate histogram for each class,对每个类给个单独的直方图进行样本学习,This gives class-conditional probability density,K,c,(bin,x,) = number of instances in bin,x,with class c,K,c,= number of instances with class c,Obtain class prior,获得先验概率,Producing the classifier,Histogram bin size,small bins,large bins,- can fit fine details of distribution,可以拟合分布的精确细节,- cannot model details of distribution, groups distant points together,不能模拟分布的细节,距离较远的点会被当成一组,- jagged estimator, abrupt changes at bin boundaries,估计结果呈现锯齿状,在区域边界处会出现突然的变化,smooth estimator,估计结果是平滑的,- subject to noise, small amount of data in each bin,容易受到干扰,在每个区域都只有少量的数据,- less subject to noise, more points in each bin,不容易受到干扰,在每个区域都有很多点,might overfit,可能过拟合,might underfit,可能欠拟合,两个极端的例子,Extreme case: small bins,Tiny bins, each containing at most one point,Very jagged estimator,Completely unable to generalize,Extreme case: large bins,Only one bin,All points have same density,Does not model any pattern in data,Bin size is “smoothing parameter”,Increasing the dimsension, Attribute space has n dimensions,Each dimension divided into d bins,Number of bins:,What goes wrong?,- almost all bins will be empty,useless for generalization,高维空间由于数据稀疏,很难应用,Advantages,and,Disadvantages,of histograms,Very simple,Very fast classification,Very fast learning,Easy to understand and depict graphically,Easy online updates (increment bin count),No prior assumptions about form of model, Curse of dimensionality, Need to choose bin size, Discontinuous density function, Fixed bin size over input space,- some bins may have many points: underfit,- other may have few points: subject to noise,优点,:,很简单,非常快的分类,非常快的学习,易于理解和描绘图形,易于在线更新,无需事先假设模型形式,缺点,:,维数灾难,需要选择区域大小 , 密度函数不连续 固定区域大小与输入空间不合适,,一些区域可能有很多点造成欠拟合 ,其他区域可能有少的点,易受噪声干扰,Probability Density estimation(PDE),概率密度估计-核心思想,The most fundamental techniques rely on the fact that the probability P that a vector x will fall in a region R is given by,一个随机变量,X,落在区域,R,的概率为,P,R,p(x),P is a smoothed or averaged version of the density function p(x), and we can,estimate this smoothed value of,p by estimating the probability P.,Suppose that n samples x,1,x,n,are drawn independently and identically distributed (i.i.d.) according to the probability law p(x).,Probability Density estimation,p,1,是样本,x,落入,R,内的概率,p,k,是,k,个样本落入,R,内的概率,Clearly, the probability that,k of these N fall in R is,given by the binomial law,It can be shown (from the properties of the binomial p.m.f.) that the mean and variance of the ratio,𝑘,/,𝑁,are,根据二项分布的性质,变量,k/N,的均值和方差分别可以计算出来,Probability Density estimation,Therefore, as,𝑁,the distribution becomes sharper (the variance gets smaller), so we can expect that a good estimate of the probability,𝑃,can be obtained from the mean fraction of the points that fall within R,因此,当,𝑁,很大趋近于无穷时,方差变得很小,分布就会变得轮廓清晰,这样,从落入,R,区域的点的平均值中,我们可以期望概率,P,获得一个好的估计,Probability Density estimation,the ratio k/n will be a very good estimate for the probability P,是,P,的一个比较好的估计,.,If we assume that p(x) is continuous and that the region R is so small that p(x) does not vary appreciably within it, then,设,p(x),在,R,内连续变化,当,R,逐渐减小的时候,小到使,p(x),在其上几乎没有变化时,则,其中 是,R,包围的体积,Probability Density estimation,(1) This estimate becomes more accurate as we increase the number of sample points,𝑁,当,V,固定的时候,N,增加, k,也增加,估计值变得更加精确。,但当 时,The estimate for,p(,x):,概率密度的估计:,(,V,足够小,),Probability Density estimation,这不是一个好的估计,In practice the total number of examples is fixed,(2) To improve the accuracy of the estimate,𝑝,(,𝑥,) we could let,𝑉,approach zero,当,N,固定,体积,V,变小,p(x),起伏比较大,噪声比较大,需要对,V,进行改进,.,Probability Density estimation,This is the case where no samples are included in,R,区域,R,内没有样本的情况,In this case, the estimate diverges,在这种情况下,估计发散,V is large: the estimated probability density is too smooth, inaccurate,V,很大:估计的概率密度过于平滑,失去准确性,Probability Density estimation,V is small: need a lot of samples, and the estimation results are unstable,V,很小:需要的样本很多,且估计结果起伏不定,This means that, in practice, we will have to find a compromise for,𝑉,Large enough to include enough examples within R,Small enough to support the assumption that,𝑝,(,𝑥,) is constant within R,寻找一个折中的,V,能使得,R,内有足够的样本,并且支持,p(x),在,R,内恒定的假设,The volume V needs to approach 0 if we want to obtain,p(x) rather than just an averaged version of it., In practice, V cannot be allowed to become small since the,number of samples, N, is always limited, We have to accept a certain amount of averaging in p(x),如果我们想获得,p(x),,而不是仅仅是一个平均的版本,它的体积,V,需要接近,0,。,在实践中,不可以允许,V,变小,因为,样本数总是有限的,我们必须接受一定量的,p(x),平均值作为估计结果。,To estimate the density at x, we form a sequence of regions R,1,R,2,. containing x -the first region to be used with one sample, the second with two, and so on. Let V,N,be the volume of R,N, K,N,be the number of samples falling in R,N, and p,N,(x) be the Nth estimate for p(x):,为了估计,X,点的密度,我们构造一串包括,X,的区域序列,R,1,R,2,. R,N,.,对,R,1,采用一个样本进行估计,对,R,2,采用二个样本进行估计。设,V,N,是,R,N,的体积,,K,N,是,N,个样本落入,R,N,的样本数,则,密度的第,N,次估计:,Probability Density estimation,其中,V,N,是,R,N,的体积,K,N,是,N,个样本落入,R,N,的样本数,p,N,(x),是,p(x),的第,N,次估计,非参数概率密度估计的关键-,V,If p,N,(x) is to converge to p(x), three conditions appear to be required:,p,N,(x),收敛于,p(x),的三个条件:,Probability Density estimation,There are two different ways of obtaining sequences of regions that satisfy these conditions:,有两种不同的方式获得满足这些条件的区域序列:,We can fix,𝑉,and determine,𝑘,from the data. This leads to kernel density estimation (KDE),,,such as,Parzen-window estimation method,We can fix,𝑘,and determine,𝑉,from the data. This gives rise to the k-nearest-neighbor (kNN) approach,Probability Density estimation,(a) Shrink an initial region where,and show that,This is called “the Parzen-window estimation method”,Probability Density estimation,Parzen,窗方法,首先给定一个初始,V,值,然后使区域序列的体积,Vn,按照某个函数随,N,的增大不断地缩小,如,(b) Specify,k,n,as some function of n, such as ; the volume V,n,is grown until it encloses k,n,neighbors of x.,This is called “the k,n,-nearest neighbor estimation method”,Probability Density estimation,k,n,-,近,邻方法,确定,k,n,为,n,的某个函数,随,N,的增大而变大,如,It can be shown that both kNN and KDE converge to the true probability density as,𝑁, provided that,𝑉,shrinks with,𝑁, and that,𝑘,grows with,𝑁,appropriately,当,N ,时,,k,近邻法和核函数密度估计法都会向真实的概率密度趋近,一个是,V,随着,N,相应的减小,另一个是,k,随着,N,相应的增大。,Parzen-window,and,k,n,-nearest neighbor,Selection strategies of V are different,区别:选择,V,的策略不同。,the Parzen-window estimation method,the k,n,-nearest neighbor estimation method,Kernel-based methods,In statistics,kernel density estimation,(KDE),is a non-parametric way to estimate the probability density function of a random variable.,KDE,is a fundamental data smoothing problem, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the,ParzenRosenblatt window,method,在统计中,核密度估计(,KDE,)是一种非参数方法,用来估计一个随机变量的概率密度函数。核密度估计是一个基本的数据平滑问题,以有限的数据样本为基础。在某些领域,如信号处理和计量经济学,它也被称为的,Parzen-Rosenblatt,窗口法,是由伊曼纽尔,-Parzen,和穆雷,-,罗森布拉特提出。,Kernel-based methods,Idea: have each point create its own bin,理念:每个点创建自己的区域,Use,Estimate by kernel function H(x),So,Different meaning of kernel than in SVMs!,与支持向量机的核函数的含义不同,Example: hyper-cube kernel,超立方体核,Hyper-cube with length h,长度为,h,的超立方体,Max norm:,- maximum distance in any dimension,在任何维方向上的最大距离,- the number of points inside hypercube centered at,x,在,x,为中心的超立方体内的点的数量,Parzen windows,Problem formulation,Assume that the region R that encloses the,𝑘,examples is a hypercube with sides of length centered at,𝑥,Then its volume is given by,𝑉,= , where,𝐷,is the number of dimensions,假设,R,为一个,D,维的超立方体,,h,为超立方体的长度,超立方体体积为: ,,d=1,,窗口为一线段,d=2,,窗口为一平面,d=3,,窗口为一立方体,d3,,窗口为一超立方体,45,2024/9/8,kernel function,To find the number of examples that fall within this region we define a kernel function,定义窗函数(核函数、势函数),This kernel, which corresponds to a unit hypercube centered at the origin, is known as a Parzen window or the naive estimator,朴素估计器,46,2024/9/8,kernel function,To find the number of examples that fall within this region we define a kernel function,定义窗函数(核函数、势函数),The quantity is then equal to 1 if is inside a hypercube of side centered on,𝑥,,,and zero otherwise,47,2024/9/8,kernel function,The total number of points inside the hypercube is then,落入宽度为,h,,中点为,x,的单元中的样本个数,对,p(x),的非参数化模拟,Notice how the kernel function estimate resembles the histogram, with the exception that the bin locations are determined by the data,-,注意到核函数估计和直方图法很相似,但窗的位置是由数据来确定的。,Exercise,Given training points 4,5,5,6,12,14,15,15,16,17, use Parzen windows to estimate the density,𝑝,(,𝑥,) at x=3,10,15; use =4,Solution,Lets first draw the dataset to get an idea of the data,Lets now estimate,𝑝,(x=3),Exercise,Given training points 4,5,5,6,12,14,15,15,16,17, use Parzen windows to estimate the density,𝑝,(,𝑥,) at x=3,10,15; use =4,Solution,Lets first draw the dataset to get an idea of the data,Lets now estimate,𝑝,(x=10),Exercise,Given training points 4,5,5,6,12,14,15,15,16,17, use Parzen windows to estimate the density,𝑝,(,𝑥,) at x=3,10,15; use =4,Solution,Lets first draw the dataset to get an idea of the data,Lets now estimate,𝑝,(x=15),Example: hyper-cube,Histograms-,Density estimate,Whole space has volume 1,Assume uniform density:,回顾,Drawbacks of hyper-cube,The hyper-cube has several drawbacks,It yields density estimates that have discontinuities,It weights equally all points,𝑥,𝑖, regardless of their distance to the estimation point,𝑥,矩形窗会造成估计的密度不连续,而且每个训练样本的的权重都一样,没有考虑他们到估计点的距离因素,.,Drawbacks of hyper-cube,The hyper-cube has several drawbacks,It yields density estimates that have discontinuities,It weights equally all points,𝑥,𝑖, regardless of their distance to the estimation point,𝑥,每个样本对估计所起的作用依赖于它到,x,的距离,即,| x-x,i,|h/2,时,,x,i,在,V,内为,1,,否则为,0,。,称为 的窗函数,取,0,,,1,两种值,但有时可以取,0, 0.1, 0.2,多种数值,例如随,x,i,离,x,接近的程度, 取值由,0, 0.1, 0.2,到,1,。,希望,Soft kernels,Points are partially in bin, depending on distance from center,点是部分的在区域内,取决于点与区域中心的距离,All training instances used to compute density,所有的训练样本都参与密度的计算,Instances weighted by distance from point,点的距离作为样本的权值,For these reasons, the hyper-cube window is commonly replaced with a smooth kernel function,(,𝑢,),因此,矩形窗形函数通常被软核函数替代,Usually, but not always,(,𝑢,) will be a radially symmetric and unimodal pdf, such as the Gaussian,通常,(,𝑢,),会成放射状的对称,并且单峰分布,比如,Gaussian,Gaussian kernel,Gaussian kernel:,Gaussian kernel density estimate,For soft kernel, volume is integral of kernel function over the space,对于软核,体积是核函数在空间内的积分,For Gaussian kernel:,So density estimate:,Interpretation,Just as the Parzen window estimate can be seen as a sum of boxes centered at the data, the smooth kernel estimate is a sum of “bumps”,Parzen,窗估计如果使用矩形函数,可以认为是以样本数据为中心的盒子的累加,如果使用软核估计,就是冲击的累加结果了。,The kernel function determines the shape of the bumps,核函数确定了冲击的形状,The parameter , also called the smoothing parameter or bandwidth, determines their width,参数,h,被称为平滑参数或窗宽,它确定了冲击的宽度,估计的目的:从样本集,K=,x,1,x,2,x,N,估计样本空间中任何一点的概率密度,p,(,x,),基本方法:用某种核函数表示某一样本对待估计的密度函数的贡献,,所有样本所作贡献的线性组合,视作对某点概率密度,p,(,x,),的估计,Kernel-based methods,Kernel-based methods,Kernels,- the resulting density p,N,(x),is proper,要求估计的,p,N,(x),应满足:,- two conditions on the kernel function,为满足这两个条件,要求窗函数满足:,Kernels,Kernel-based classification,Class-specific kernel H,c,(,x,),- counts only points with class c,- hypercube:,- Gaussian:,Given a point,x, choose class that maximizes H,c,(,x,),Hypercube kernel classification,Gaussian kernel classification,Weighted sum of positive points is greater than negative points,“,+”,点的加权和大于“,-”,点的加权和,所以测试点为“,+”,Smoothing parameter h,Parameter h is a smoothing parameter,- hypercube: size of bin,- Gaussian: variance of Gaussian bump,As with number of histogram bins,- h too large, may underfit,- h too small, may overfit,参数,h,是一个平滑参数,-,超立方体:区域大小,-,高斯:高斯冲击的方差,和直方图单元尺寸一样,- H,过大,可能欠拟合,- H,太小,可能过拟合,Effect of h,窗大小的影响,Effect of h,The problem of choosing,𝒉,is crucial in density estimation,A large will over-smooth the DE and mask the structure of the data,A small will yield a DE that is spiky and very hard to interpret,窗长度,h,对概率密度估计值,p,N,(x),的影响,若,h,太大, p,N,(x),是,p(x),的一个平坦的、分辨率低的估计,有平均误差,若,h,太小, p,N,(x),是,p(x),的一个不稳定的起伏大的估计,有噪声误差,为了使这些误差不严重,,h,应仔细选择,Parzen-window,Selection strategies of V,the Parzen-window estimation method,73,2024/9/8,kernel function,The total number of points inside the hypercube is then,落入宽度为,h,,中点为,x,的小条中的样本个数,对,p(x),的非参数化模拟,Ex.1,:,对于一个二类(,1,,,2,)识别问题,随机抽取,1,类的,6,个样本,X=(x,1,,,x,2,,,. x,6,) ,1,=(x,1,,,x,2,,,. x,6,),(x,1,=3.2,,,x,2,=3.6,,,x,3,=3,,,x,4,=6,,,x,5,=2.5,,,x,6,=1.1),估计,p(x|,1,),即,p,N,(x)?,0,1,2,3,4,5,6,x,6,x,5,x,3,x,1,x,2,x,4,X,轴,Effect of h,窗大小的影响,Ex.,Select Gaussian kernel,x,是一维的,上式用图形表示是,6,个分别以,3.2,,,3.6,,,3,,,6,,,2.5,,,1.1,为中心的正态曲线,而,P,N,(x),则是这些曲线之和。,Ex.,设待估计的概率密度,p(x),是个均值为,0,,方差为,1,的正态分布。若随机地抽取,X,样本中的,1,个、,16,个、,256,个作为学习样本,x,i,试用窗口法估计,P,N,(x),。,解:设窗口函数为正态的,,1,,,0,h,N,:,窗长度,,N,为样本数,,h,1,为选定可调节的参数。,用 窗法估计单一正态分布的实验,N,=,N,=256,N,=16,N,=1,待估的密度函数为,采用正态窗函数,x,-2.5,-2,1,0.25,0,2,p(x),-2.5x-2,0xn patterns,d dimensions attribute space,c classes,dn,个连接,n,个连接,Modifiable weights (trained),a,11,a,2c,a,n1,Probabilistic Neural Networks,x1,xd,.,.,.,x2,p,1,p,2,p,n,.,.,.,Input unit,Patterns,units,.,.,.,.,W,dn,W,d2,W,11,1,c,c,.,Category,units,.,.,.,2,.,.,.,.,X,2,dn,个连接,n,个连接,Modifiable weights (trained),a,11,a,2c,a,n1,概率神经网络(,PNN,),-,一种,Parzen,窗的实现,n samples-n patterns,d dimensions attribute space,c classes,Normalization,先说一下归一化问题,Patterns are normalized (or scaled) to have unit length, or,This is done by replacing each feature value by,Effect of normalization,Normalization Example,Normalize x = 3 4,t,(9+16),1/2,= 5,Normalized vector is =3/5 4/5,t,= 0.6 0.8,t,Effect of normalization,Training the network-,Algorithm,Nor
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 管理文书 > 施工组织


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!