资源描述
,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,目录,1,X的非线性的诊断,2,残差的独立性诊断,3,残差的非正态性改善,4,共线性解决方法,目的,诊断,解决,X的非线性的诊断,partial residual plots,partial residual plots,残差的独立性,Durbin-Watson statistic(DW统计量),近似于2的DW值表示残差不存在相关性,接近0的DW值表示正相关,接近4的DW值表示负相关,时间序列分析,残差的非正态性,残差的正态性检验,对y进行box-cox变换,方差非齐性,观察回归后得到的残差图是否围绕残差=0随机扰动,对y进行box-cox变换,X共线性问题,VIF5 不存在多重共线性,5=VIF10 存在严重多元共线性,1,删除某些变量,2,岭回归,3,主成分回归,4,偏最小二乘回归,正确的模型:,未,正确拟合的模型:,所以:,这里 是拟合模型得到的残差,非线性诊断,非线性诊断,定义,又:,这里 叫做 partial residual。,散点图 可以反映g的函数形式,example1:,真实模型:,拟合模型:,非线性诊断,example1:,模型拟合后得到的残差e,下面寻找x1是否存在非线性,计算:,画出 的散点图,观察散点图,判断函数g的可能情形,非线性诊断,非线性诊断,/*generate random variables*/,data simulation;,seed=12345;,do i=1 to 100;,x1=RANNOR(seed);,x2=RANNOR(seed);,g=x1+10*x1*x1;,res=RAND(normal,0,0.25);,y=g+x2+res;,output;,end;,run;,非线性诊断,/*regression y to x1 x2*/,proc reg data=simulation;,model y=x1 x2;,plot,y*x1;,/*r.residual of regression,p.the predicted value of y*/,/*the dot.is needed*/,plot,r.*p.,;,/*generate a dataset called out which,contains the residual of the regression*/,output,out=out r=residual;,run;,quit;,非线性诊断,非线性诊断,y对x1的散点图,非线性诊断,y的预测值对回归后得到的残差的散点图,非线性诊断,y的预测值对回归后得到的残差的散点图,从上面残差的散点图说明残差不满我们的假设条件,非线性诊断,/*calcuate the e+:partial residuals*/,data simulation1;,set out;,/*-4.67119 is the estimate,of beta1 that is coefficient of x1*/,residua1_e=residual,-4.67119,*x1;,run;,非线性诊断,/*plot e+against x1.,Here we try to reveal the form of g*/,proc gplot data=simulation1;,plot residua1_e*x1;,run;,非线性诊断,非线性诊断,从上面图中猜测g为x1的二次函数,,下面y对x12 和x2进行回归,非线性诊断,从上面图中猜测g为x1的二次函数,,下面y对x12 和x2进行回归,非线性诊断,data new;,set simulation;,x3=x1*2;,run;,非线性诊断,/*the step above which helps generate the quadratic x1 is needed.,Because its not allowed us to put x1*x2 in the model statement in the proc reg.,for example;,proc reg data=sumulation;,model y=x2 x1*x1;,run;,(wrong),*/,非线性诊断,/*Alternative way is to choose the proc glm.In this process,you just put x1*x1 in the model statement.,for example;,proc glm data=sumulation;,model y=x2 x1*x1;,run;,(right),But the results of the two process,is a little different.*/,非线性诊断,proc reg data=new;,model y=x3 x2/,spec dw dwprob,;,plot y*x3;,plot r.*p.;,run;,quit;,/*option spec:,残差的方差齐性检验,*,/,/*optio nDW computes a Durbin-Watson statistic*/,/*option DWPROB computes a Durbin-Watson statistic and p-value*/,非线性诊断,dw在2附近,所以残差独立不存在序列相关。,ProbChisq of,0.0284,0.05,,我们下结论残差的方差不满足齐性的假设,Note:Pr0.05,所以残差不存在正相关性,and PrDW is the p-value for testing negative autocorrelation,spec,dw,,dwprob,非线性诊断,非线性诊断,y对x12即x3的散点图,非线性诊断,y预测值对第二次回归得到的残差的散点图,非线性诊断,y对x1 x12 x3回归,proc reg data=new;,model y=x3 x2 x1/spec dw dwprob;,plot y*x3;,plot r.*p.;,run;,quit;,非线性诊断,y对x1 x12 x3回归,非线性诊断,y对x1 x12 x3回归,非线性诊断,模型,决定系数,MODEL Y1=X1 X2;,0.1260,MODEL Y1=X3 X2;,0.9950,MODEL Y1=X1 X3 X2;,0.9996,这种方法的局限性:如果x之间高度相关,这种方法并不太好。,BOX-COX变换,/*when the test for the normality or homogeneity of variance,of residual do not satisfy we can use the,box_cox tranformation to response*/,title Basic Box-Cox Example;,data x;,do x=1 to 8 by 0.025;,y=exp(x+normal(7);,output;,end;,run;,example2:box_cox tranformation,BOX-COX变换,/*when the test for the normality or homogeneity of variance,of residual do not satisfy we can use the,box_cox tranformation to response*/,title Basic Box-Cox Example;,data x;,do x=1 to 8 by 0.025;,y=exp(x+normal(7);,output;,end;,run;,example2:box_cox tranformation,BOX-COX变换,proc reg data=x;,model y=x;,plot r.*p.;,run;,example2:box_cox tranformation,BOX-COX变换,example2:box_cox tranformation,BOX-COX变换,example2:box_cox tranformation,/*SS2,:,displays regression results SS2*/,/*details,:,displays model specification details DETAIL*/,proc transreg data=x ss2 details;,title2 Defaults;,model boxcox(y)=identity(x);,run;,quit;,BOX-COX变换,example2:box_cox tranformation,说明y要经过log(y)变换,BOX-COX变换,example2:box_cox tranformation,y变换后即log(y)对x的回归,BOX-COX变换,example2:box_cox tranformation,data x;,do x=1 to 8 by 0.025;,y=exp(x+normal(7);,y2=log(y);,output;,end;,run;,proc reg data=x;,model y=x;,plot r.*p.;,model y2=x;,plot r.*p.;,run;,quit;,model,R square,Y=X,0.2233,LOG(Y)=X,0.7906,BOX-COX变换,example2:box_cox tranformation,从残差图上看到,变换后的残差图围绕0随机扰动,,符合我们的假设条件。但是也要注意变换在实际问题中必须,具有实际意义。,共线性问题的解决 -岭回归,(ridge regression),k is a positive number.,proc reg data=elemapi2,outest=b ridge=0 to 0.3 by.002;,model api00=acs_k3 avg_ed grad_sch col_grad some_col/vif tol;,plot/ridgeplot;,run;,quit;,共线性问题的解决 -岭回归,(ridge regression),共线性问题的解决 -岭回归,(ridge regression),从上图得到在k=0.1时,所有x岭迹趋于稳定,所以我们选择k=0.1。,proc print data=b;,where,0.102_RIDGE_=0.1;,run;,作业:,对于上次上机的作业的数据:,1,非线性诊断,若存在非线性并改善,2,残差的独立性诊断,3,残差的正态性诊断(proc univariate),4,共线性诊断,若存在共线性,要求求得岭估计,交作业,时间:2012.12.25,
展开阅读全文