资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,#,一、什么是虚拟变量?,一、什么是虚拟变量?,什么是虚拟变量?,变量的分类,定类,/,类别变量(,nominal/categorical variable,),定序,/,有序变量(,ordinal variable,),定距,/,定量变量(,interval variable,),问题:如何在计量分析中使用定类变量和定序变量,什么是虚拟变量?变量的分类,2,什么是虚拟变量?,对于只有两种取值的定类变量,可以用,0,和,1,表示这两种取值。这样的变量称为,虚拟变量(,dummy variable,),或,二分变量(,binary variable,),。其中,赋值为,0,的一组称为,对照组(,reference group,),或,基准组(,benchmark group,),注意:变量名的选择,什么是虚拟变量?对于只有两种取值的定类变量,可以用0和1表示,3,什么是虚拟变量?,对于有多个取值的定类变量,可构造多个虚拟变量来表示,用,east,、,central,、,west,三个虚拟变量表示不同地区,用,sx,、,jy,、,qt,三个虚拟变量表示本科生毕业后的状态,什么是虚拟变量?对于有多个取值的定类变量,可构造多个虚拟变量,4,什么是虚拟变量?,对于定序变量和定距变量,也可以用虚拟变量表示,学习成绩(定序变量),年收入(定距变量,但通过划分收入区间将之转换为虚拟变量),什么是虚拟变量?对于定序变量和定距变量,也可以用虚拟变量表示,5,自变量中包含一个虚拟变量,自变量中包含多个虚拟变量,交互项,二、自变量中包含虚拟变量,自变量中包含一个虚拟变量二、自变量中包含虚拟变量,自变量中包含一个虚拟变量,自变量仅为一个虚拟变量,如果自变量仅为一个虚拟变量,实际上是以自变量为分类依据,分析因变量的均值差异,自变量中包含一个虚拟变量自变量仅为一个虚拟变量,7,自变量中包含一个虚拟变量,例题,7_1,:工资差异,自变量中包含一个虚拟变量例题7_1:工资差异,8,自变量中包含一个虚拟变量,自变量包含定距变量和一个虚拟变量,此时,虚拟变量的回归系数表示在控制其它自变量的情况下,两组之间的差异,例题,7_2,:课本,p217,,例,7.2,自变量中包含一个虚拟变量自变量包含定距变量和一个虚拟变量,9,自变量中包含一个虚拟变量,自变量包含定距变量和一个虚拟变量,虚拟变量可用于政策分析,虚拟变量取值为,0,的一组称为,控制组或对照组(,control group,),,取值为,1,的一组称为,实验组(,experimental group,),或,处理组(,treatment group,),例题,7_3,:课本,p218,,例,7.3,自变量中包含一个虚拟变量自变量包含定距变量和一个虚拟变量,10,自变量中包含多个虚拟变量,自变量中包含多个虚拟变量可分为三种情况,每个虚拟变量代表不同的分类,若干个虚拟变量代表同一种分类,且这种分类是定类的,若干个虚拟变量代表同一种分类,且这种分类是定序的,自变量中包含多个虚拟变量自变量中包含多个虚拟变量可分为三种情,11,自变量中包含多个虚拟变量,每个虚拟变量代表不同的分类,例题,7_4,:性别和婚姻状况对工资的影响(课本,p220,例,7.6,),引入性别和婚姻状况两个虚拟变量,上述结果表明,性别对工资有显著影响,但婚姻状况没有显著影响。这一模型可能存在问题,即假定婚姻状况的影响对于男性和女性是相同的。进一步的分析应考虑婚姻状况的影响存在性别差异。,自变量中包含多个虚拟变量每个虚拟变量代表不同的分类,12,自变量中包含多个虚拟变量,每个虚拟变量代表不同的分类,例题,7_4,:性别和婚姻状况对工资的影响(课本,p220,例,7.6,),虚拟变量的定义,如果有,n,种分类,则回归时只能引入,n-1,个虚拟变量,否则会造成完全共线性。未引入的即为对照组。下面是以单身男性为对照组的回归结果:,lwage,Coef.,Std.Err.,t,Pt,95%Conf.,Interval,mm,0.2127,0.0554,3.8400,0.0000,0.1039,0.3214,sf,-0.1104,0.0557,-1.9800,0.0480,-0.2199,-0.0008,mf,-0.1983,0.0578,-3.4300,0.0010,-0.3119,-0.0846,自变量中包含多个虚拟变量每个虚拟变量代表不同的分类lwage,13,自变量中包含多个虚拟变量,每个虚拟变量代表不同的分类,例题,7_4,:性别和婚姻状况对工资的影响(课本,p220,例,7.6,),不同性别和婚姻状况个体的样本回归方程,自变量中包含多个虚拟变量每个虚拟变量代表不同的分类,14,自变量中包含多个虚拟变量,每个虚拟变量代表不同的分类,例题,7_4,:性别和婚姻状况对工资的影响(课本,p220,例,7.6,),根据研究关注问题的不同,可以选择不同的对照组。,下面是以已婚女性为对照组的回归结果:,lwage,Coef.,Std.Err.,t,Pt,95%Conf.,Interval,sm,0.1983,0.0578,3.4300,0.0010,0.0846,0.3119,mm,0.4109,0.0458,8.9800,0.0000,0.3210,0.5009,sf,0.0879,0.0523,1.6800,0.0940,-0.0149,0.1908,自变量中包含多个虚拟变量每个虚拟变量代表不同的分类lwage,15,自变量包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定类的,例题,7_5,:工资的地区差异,northcen=1,表示中北部地区,west=1,表示西部地区,south=1,表示南部地区,other_region=1,表示其他地区,同样地,如果有,n,种分类,回归时只能引入,n-1,个虚拟变量!,自变量包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种分,16,自变量中包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定类的,例题,7_5,:工资的地区差异,以其他地区为对照组,lwage,Coef.,Std.Err.,t,Pt,northcen,-0.0783,0.0563,-1.39,0.1650,south,-0.1048,0.0527,-1.99,0.0470,west,0.0218,0.0624,0.35,0.7270,educ,0.0890,0.0075,11.86,0.0000,exper,0.0418,0.0052,8.00,0.0000,expersq,-0.0007,0.0001,-6.25,0.0000,_cons,0.1918,0.1125,1.70,0.0890,自变量中包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种,17,自变量中包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定类的,例题,7_5,:工资的地区差异,以南部为对照组,lwage,Coef.,Std.Err.,t,Pt,northcen,0.0265,0.0512,0.52,0.6040,west,0.1266,0.0574,2.21,0.0280,other_region,0.1048,0.0527,1.99,0.0470,educ,0.0890,0.0075,11.86,0.0000,exper,0.0418,0.0052,8.00,0.0000,expersq,-0.0007,0.0001,-6.25,0.0000,_cons,0.0870,0.1072,0.81,0.4170,自变量中包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种,18,自变量包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定序的,例题,7_6,:法学院排名对起薪的影响(课本,p224,,例,7.8,),共引入,6,个虚拟变量表示法学院排名,排名前,10,top10=1,排名,11-25,r11_25=1,排名,26-40,r26_40=1,排名,41-60,r41_60=1,排名,61-100,r61_100=1,排名,100,以后,bottom=1,自变量包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种分,19,自变量中包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定序的,例题,7_6,:如果有,n,种排序,回归时引入的虚拟变量数目应少于,n,,下面是以排名在,100,名以后的为对照组的回归结果,lsalary,Coef.,Std.Err.,t,Pt,top10,0.6996,0.0535,13.08,0.0000,r11_25,0.5935,0.0394,15.05,0.0000,r26_40,0.3751,0.0341,11.01,0.0000,r41_60,0.2628,0.0280,9.40,0.0000,r61_100,0.1316,0.0210,6.25,0.0000,LSAT,0.0057,0.0031,1.86,0.0660,GPA,0.0137,0.0742,0.19,0.8540,llibvol,0.0364,0.0260,1.40,0.1650,lcost,0.0008,0.0251,0.03,0.9730,_cons,9.1653,0.4114,22.28,0.0000,自变量中包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种,20,自变量中包含多个虚拟变量,若干个虚拟变量代表同一种分类,且这种分类是定序的,例题,7_6,:以排名在,26-60,名的为对照组,另外,参看课本,p223,,例,7.7,lsalary,Coef.,Std.Err.,t,Pt,top10,0.3733,0.0437,8.55,0.0000,r11_25,0.2766,0.0323,8.56,0.0000,r61_100,-0.1732,0.0240,-7.22,0.0000,bottom,-0.2994,0.0269,-11.14,0.0000,LSAT,0.0049,0.0032,1.53,0.1290,GPA,0.0596,0.0759,0.78,0.4340,llibvol,0.0436,0.0270,1.62,0.1090,lcost,0.0103,0.0260,0.39,0.6940,_cons,9.3214,0.4402,21.18,0.0000,自变量中包含多个虚拟变量若干个虚拟变量代表同一种分类,且这种,21,交互项,含有虚拟变量之间的交互项(,interaction term,),例题,7_7,:性别和婚姻状况对工资的影响,female,married,female_married,单身男性,0,0,0,已婚男性,0,1,0,单身女性,1,0,0,已婚女性,1,1,1,交互项含有虚拟变量之间的交互项(interaction te,22,交互项,含有虚拟变量之间的交互项,例题,7_7,:性别和婚姻状况对工资的影响,以单身男性为对照组,交互项含有虚拟变量之间的交互项,23,交互项,含有虚拟变量之间的交互项,例题,7_7,:不同性别和婚姻状况个体的回归方程,另外,参看课本,p226,,例,7.9,交互项含有虚拟变量之间的交互项另外,参看课本p226,例7.,24,交互项,含有虚拟变量与定距变量的交互项,考虑模型:,上述模型假定男性和女性工资方程的截距不同,但受教育年限的斜率系数对于男性和女性都是相同的,female,lwage,educ,male,交互项含有虚拟变量与定距变量的交互项femalelwagee,25,交互项,含有虚拟变量与定距变量的交互项,考虑模型:,上述模型假定男性和女性工资方程的截距不同,而且受教育年限的斜率系数对于男性和女性也不同。,female,lwage,educ,
展开阅读全文