模式识别6资料课件

资源描述

模式识别原理华中科技大学图像识别与人工智能研究所华中科技大学图像识别与人工智能研究所图像分析与智能系统研究室图像分析与智能系统研究室曹治国曹治国6/30/202414.1 IntroductionThe sea bass/salmon exampleState of nature,priorState of nature is a random variableThe catch of salmon and sea bass is equiprobableP(1)=P(2)(uniform priors)P(1)+P(2)=1(exclusivity and exhaustivity)6/30/202424.1 IntroductionDecision rule with only the prior informationDecide 1 if P(1)P(2)otherwise decide 2Use of the class conditional informationp(x|1)and p(x|2)describe the difference in lightness between populations of sea and salmon6/30/202434.1 Introduction6/30/202444.1 IntroductionPosterior,likelihood,evidenceP(j|x)=p(x|j).P(j)/P(x)Where in case of two categories Posterior=(Likelihood.Prior)/Evidence6/30/202454.1 Introduction6/30/202464.1 IntroductionDecision given the posterior probabilitiesX is an observation for which:if P(1|x)P(2|x)True state of nature=1if P(1|x)P(2|x);otherwise decide 26/30/202474.1 Introductionwhenever we observe a particular x,the probability of error is:P(error|x)=P(1|x)if we decide 2P(error|x)=P(2|x)if we decide 16/30/202484.2 Continuous FeaturesGeneralization of the preceding ideasUse of more than one featureUse more than two states of natureAllowing actions and not only decide on the state of natureIntroduce a loss of function which is more general than the probability of error6/30/20249Where in case of multi-categoriesX is an observation for which:(1)if True state of nature=i(2)if True state of nature=i (3)if True state of nature=i (4)if True state of nature=i 6/30/202410一个例子一个例子6/30/202411LOSS FUNCTIONLet 1,2,c be the set of“c”categoriesLet 1,2,a be the set of“a”possible actionsLet (i|j)be the loss incurred for taking action i when the state of nature is jThe posterior,P(j|x),can be computed from Bayes formula:The expected loss from taking action i is:for i=1,a6/30/202412BAYES RISKAn expected loss is called a risk.R(i|x)is called the conditional risk.A general decision rule is a function (x)that tells us which action to take for every possible observation.The overall risk is given by:If we choose (x)so that R(i(x)is as small as possible for every x,the overall risk will be minimized.Compute the conditional risk for every and select the action that minimizes R(i|x).This is denoted R*,and is referred to as the Bayes risk.The Bayes risk is the best performance that can be achieved.6/30/202413BAYES RISK-TWO-CATEGORY CLASSIFICATIONLet 1 correspond to 1,2 to 2,and ij =(i|j)The conditional risk is given by:R(1|x)=11P(1|x)+12P(2|x)R(2|x)=21P(1|x)+22P(2|x)Our decision rule is:choose 1 if:R(1|x)(12-22)P(2|x);otherwise decide 2If the loss incurred for making an error is greater than that incurred for being correct,the factors(21-11)and(12-22)are positive,and the ratio of these factors simply scales the posteriors.6/30/202414BAYES RISK-LIKELIHOODBy employing Bayes formula,we can replace the posteriors by the prior probabilities and conditional densities:choose 1 if:(21-11)p(x|1)P(1)(12-22)p(x|2)P(2);otherwise decide 2If 21-11 is positive,our rule becomes:If the loss factors are identical,and the prior probabilities are equal,this reduces to a standard likelihood ratio:6/30/202415BAYES RISK-MINIMUM ERROR RATEConsider a symmetrical or zero-one loss function:The conditional risk is:The conditional risk is the average probability of error.To minimize error,maximize P(i|x)also known as maximum a posteriori decoding(MAP).6/30/202416BAYES RISK-LIKELIHOOD RATIOMinimum error rate classification:choose i if:P(i|x)P(j|x)for all j i6/30/202417例子例子6/30/202418拒绝判决拒绝判决在在C类类问题中问题中,a=c+1时时,表示拒绝判决表示拒绝判决6/30/202419拒绝判决拒绝判决6/30/202420MINIMAX CRITERIONDesign our classifier to minimize the worst overall risk(avoid catastrophic failures)Factor overall risk into contributions for each region:Using a simplified notation 6/30/202421MINIMAX CRITERIONWe can rewrite the risk:Note that I11=1-I21 and I22=1-I12:We make this substitution because we want the risk in terms of error probabilities and priors.Multiply out,add and subtract P1 21,and rearrange:6/30/202422MINIMAX CRITERIONNote P1=1-P2:6/30/202423MINIMAX CRITERION当类的当类的概率密度已知概率密度已知,损失函数损失函数选定选定,相对某一相对某一取定取定最佳的最佳的后后,a、b为为常数常数当当发生变化，而发生变化，而不作相应的调整时，不作相应的调整时，R是是的的线性函数线性函数6/30/202424MINIMAX CRITERION曲线：曲线：在在01中任意取不同值按最小损失准则可确定中任意取不同值按最小损失准则可确定相应的最佳相应的最佳，然后计算出相应的最小平均损，然后计算出相应的最小平均损失失R，得到得到与与R的的曲线。曲线。虚线：在左边的黑点为虚线：在左边的黑点为时，最小平均损失时，最小平均损失Ra，此时此时得到得到。当。当变化时，变化时，不变，则得到的不变，则得到的R为为虚线上的点，此时的损失大于曲线时的情况，称虚线上的点，此时的损失大于曲线时的情况，称为最大可能损失。为最大可能损失。我们希望我们希望变化时，最大可能变化时，最大可能的损失的损失R最小，则最小，则b=0是平行于横轴的直线是平行于横轴的直线对应于曲线最大值对应于曲线最大值6/30/202425MINIMAX CRITERION结论：结论：在不精确知道在不精确知道或或变动情况时，为使最大的可能变动情况时，为使最大的可能损失最小，应该选择最小损失损失最小，应该选择最小损失R取取最大值时的最大值时的来设来设计分类器，此时相对其他计分类器，此时相对其他在最优设计下的在最优设计下的R要大。要大。但当但当在在(0,1)发生变化时，其相应的最大损失为最小。发生变化时，其相应的最大损失为最小。若取若取0-1损失函数，则损失函数，则平均损失等于最小错误率判决规则的错误率平均损失等于最小错误率判决规则的错误率 6/30/202426MINIMAX CRITERION算法步骤：算法步骤：（1）按最小损失准则找出对应于（）按最小损失准则找出对应于（0，1）中的各个不）中的各个不同值同值的最佳决策类域的最佳决策类域（2）计算相应各个最佳决策类域的最小平均损失，得）计算相应各个最佳决策类域的最小平均损失，得R 曲线曲线（3）找出使）找出使R取取最大值的最大值的去构造最小最大损失判决去构造最小最大损失判决规则规则（4）若）若 6/30/202427NEYMAN-PEARSON CRITERION6/30/202428DECISION SURFACESDefine a set of discriminant functions:gi(x),i=1,cDefine a decision rule:choose i if:gi(x)gj(x)j iFor a Bayes classifier,let gi(x)=-R(i|x)because the maximum discriminant function will correspond to the minimum conditional risk.For the minimum error rate case,let gi(x)=P(i|x),so that the maximum discriminant function corresponds to the maximum posterior probability.Choice of discriminant function is not unique:multiply or add by same positive constantReplace gi(x)with a monotonically increasingfunction,f(gi(x).6/30/202429DECISION SURFACES-NETWORK REPRESENTATIONA classifier can be visualized as a connected graph with arcs and weights:6/30/202430DECISION SURFACES-LOG PROBABILITIESSome monotonically increasing functions can simplify calculations considerably:What are some of the reasons(3)is particularly useful?Computational complexity(e.g.,Gaussian)Numerical accuracy(e.g.,probabilities tend to zero)Decomposition(e.g.,likelihood and prior are separated and can be weighted differently)Normalization(e.g.,likelihoods are channel dependent).6/30/202431DECISION SURFACES-TWO-CATEGORY CASEA classifier that places a pattern in one of two classes is often referred to as a dichotomizer.We can reshape the decision rule:If we use log of the posterior probabilities:A dichotomizer can be viewed as a machine that computes a single discriminant function and classifies x according to the sign(e.g.,support vector machines).6/30/202432DECISION SURFACES-NORMAL DISTRIBUTIONSRecall the definition of a normal distribution(Gaussian):Mean:Covariance:6/30/202433GAUSSIAN CLASSIFIERS-DISCRIMINANT FUNCTIONSRecall our discriminant function for minimum error rate classification:For a multivariate normal distribution:6/30/202434GAUSSIAN CLASSIFIERS-DISCRIMINANT FUNCTIONSConsider the case:i=2I I(statistical independence,equal variance,class-independent variance)6/30/202435GAUSSIAN CLASSIFIERS-DISCRIMINANT FUNCTIONSThe discriminant function can be reduced to:Since these terms are constant w.r.t.the maximization:We can expand this:The term xtx is a constant w.r.t.i,and it i is a constant that can be precomputed.6/30/202436GAUSSIAN CLASSIFIERS-DISCRIMINANT FUNCTIONSWe can use an equivalent linear discriminant function:Decide on the state of nature:6/30/202437GAUSSIAN CLASSIFIERS-DISCRIMINANT FUNCTIONSCase:i=Decide on the state of nature:6/30/202438GAUSSIAN CLASSIFIERS-GENERAL CASEwhere:The decision surfaces defined by the equation:6/30/202439GAUSSIAN CLASSIFIERS-IDENTITY COVARIANCECase:i=2I I This can be rewritten as:6/30/202440GAUSSIAN CLASSIFIERS-EQUAL COVARIANCESCase:i=6/30/202441ERROR BOUNDSBayes decision rule guarantees lowest average error rateClosed-form solution for two-class Gaussian distributionsFull calculation for high dimensional space difficultBounds provide a way to get insight into a problem and engineer better solutions.6/30/202442ERROR BOUNDSNeed the following inequality:Assume a b without loss of generality:mina,b=b.Also,a b(1-)=(a/b)b and(a/b)1.Therefore,b (a/b)b,which implies mina,b a b(1-).6/30/202443ERROR BOUNDSRecall:Note that this integral is over the entire feature space,not the decision regions(which makes it simpler).If the conditional probabilities are normal,this expression can be simplified.6/30/202444ERROR BOUNDS-CHERNOFF BOUND FOR NORMAL DENSITIESIf the conditional probabilities are normal,our bound can be evaluated analytically:where:Procedure:find the value of that minimizes exp(-k(),and then compute P(error)using the bound.Benefit:one-dimensional optimization using 6/30/202445ERROR BOUNDS-BHATTACHARYYA BOUNDThe Chernoff bound is loose for extreme valuesThe Bhattacharyya bound can be derived by =0.5:where:6/30/202446DISCRETE FEATURES-INTEGRALS BECOME SUMSFor problems where features are discrete:Bayes formula involves probabilities(not densities):whereBayes rule remains the same:The maximum entropy distribution is a uniform distribution:P(x=xi)=1/N.6/30/202447DISCRETE FEATURES-INTEGRALS BECOME SUMSConsider independent binary features:Assuming conditional independence:The likelihood ratio is:The discriminant function is:6/30/202448Bayesian Belief Networks6/30/202449Bayesian Belief Networks6/30/202450Bayesian Belief Networks6/30/202451Bayesian Belief Networks6/30/202452Bayesian Belief Networks6/30/202453Bayesian Belief Networks6/30/202454Bayesian Belief Networks6/30/202455Bayesian Belief Networks6/30/202456Bayesian Belief Networks6/30/202457Bayesian Belief Networks6/30/202458Bayesian Belief Networks6/30/202459Bayesian Belief Networks6/30/202460Bayesian Belief NetworksP47,例46/30/202461

展开阅读全文

模式识别6资料课件

最新文档