资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Copyright 2016, 2013, 2010 Pearson Education, Inc.,Chapter 13, Slide,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,Introduction to Multiple Regression,Chapter 13,Introduction to Multiple Regre,Objectives,In this chapter, you learn:,How to develop a multiple regression model,How to interpret the regression coefficients,How to determine which independent variables to include in the regression model,How to use categorical independent variables in a regression model,ObjectivesIn this chapter, you,The Multiple Regression Model,Idea: Examine the linear relationship between,1 dependent (Y) & 2 or more independent variables (X,i,),Multiple Regression Model with k Independent Variables:,Y-intercept,Population slopes,Random Error,DCOV,A,The Multiple Regression ModelI,Multiple Regression Equation,The coefficients of the multiple regression model are estimated using sample data,Estimated,(or predicted),value of Y,Estimated slope coefficients,Multiple regression equation with k independent variables:,Estimated,intercept,In this chapter we will use Excel and Minitab to obtain the regression slope coefficients and other regression summary measures.,DCOV,A,Multiple Regression EquationTh,Two variable model,Y,X,1,X,2,Slope for variable X,1,Slope for variable X,2,Multiple Regression Equation,(continued),DCO,V,A,Two variable modelYX1X2Slope f,A distributor of frozen dessert pies wants to evaluate factors thought to influence demand,Dependent variable: Pie sales (units per week),Independent variables:,Price (in $),Advertising ($100s),Data are collected for 15 weeks,Example: 2 Independent Variables,DCOV,A,A distributor of frozen desser,Pie Sales Example,Sales = b,0,+ b,1,(Price),+ b,2,(Advertising),Week,Pie Sales,Price,($),Advertising,($100s),1,350,5.50,3.3,2,460,7.50,3.3,3,350,8.00,3.0,4,430,8.00,4.5,5,350,6.80,3.0,6,380,7.50,4.0,7,430,4.50,3.0,8,470,6.40,3.7,9,450,7.00,3.5,10,490,5.00,4.0,11,340,7.20,3.5,12,300,7.90,3.2,13,440,5.90,4.0,14,450,5.00,3.5,15,300,7.00,2.7,Multiple regression equation:,DCOV,A,Pie Sales ExampleSales = b0 +,Excel Multiple Regression Output,Regression Statistics,Multiple R,0.72213,R Square,0.52148,Adjusted R Square,0.44172,Standard Error,47.46341,Observations,15,ANOVA,df,SS,MS,F,Significance F,Regression,2,29460.027,14730.013,6.53861,0.01201,Residual,12,27033.306,2252.776,Total,14,56493.333,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,2.68285,0.01993,57.58835,555.46404,Price,-24.97509,10.83213,-2.30565,0.03979,-48.57626,-1.37392,Advertising,74.13096,25.96732,2.85478,0.01449,17.55303,130.70888,DCOV,A,Excel Multiple Regression Outp,Minitab Multiple Regression Output,The regression equation is,Sales = 307 - 25.0 Price + 74.1 Advertising,Predictor Coef SE Coef T P,Constant306.50 114.30 2.68 0.020,Price -24.98 10.83 -2.31 0.040,Advertising 74.13 25.97 2.85 0.014,S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%,Analysis of Variance,Source DF SS MS F P,Regression 2 29460 14730 6.54 0.012,Residual Error12 27033 2253,Total 14 56493,DCOV,A,Minitab Multiple Regression Ou,The Multiple Regression Equation,b,1,= -24.975,:,sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising,b,2,= 74.131,:,sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price,where,Sales is in number of pies per week,Price is in $,Advertising is in $100s.,DCOV,A,The Multiple Regression Equati,Using The Equation to Make Predictions,Predict sales for a week in which the selling price is $5.50 and advertising is $350:,Predicted sales is 428.62 pies,Note that Advertising is in $100s, so $350 means that X,2,= 3.5,DCOV,A,Using The Equation to Make Pre,Predictions in Excel using PHStat,PHStat | regression | multiple regression ,Check the “confidence and prediction interval estimates” box,DCOV,A,Predictions in Excel using PHS,Input values,Predictions in PHStat,(continued),Predicted Y value,Confidence interval for the mean value of Y, given these X values,Prediction interval for an individual Y value, given these X values,DCOV,A,Input valuesPredictions in PHS,Predictions in Minitab,Input values,Predicted Values for New Observations,New,Obs Fit SE Fit 95% CI 95% PI,1 428.6 17.2 (391.1, 466.1) (318.6, 538.6),Values of Predictors for New Observations,New,Obs Price Advertising,1 5.50 3.50,Confidence interval for the mean value of Y, given these X values,Prediction interval for an individual Y value, given these X values,DCOV,A,Predictions in MinitabInput va,The Coefficient of Multiple Determination, r,2,Reports the proportion of total variation in Y explained by all X variables taken together,DCOV,A,The Coefficient of Multiple De,Regression Statistics,Multiple R,0.72213,R Square,0.52148,Adjusted R Square,0.44172,Standard Error,47.46341,Observations,15,ANOVA,df,SS,MS,F,Significance F,Regression,2,29460.027,14730.013,6.53861,0.01201,Residual,12,27033.306,2252.776,Total,14,56493.333,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,2.68285,0.01993,57.58835,555.46404,Price,-24.97509,10.83213,-2.30565,0.03979,-48.57626,-1.37392,Advertising,74.13096,25.96732,2.85478,0.01449,17.55303,130.70888,52.1% of the variation in pie sales is explained by the variation in price and advertising,Multiple Coefficient of Determination In Excel,DCOV,A,Regression StatisticsMultiple,Multiple Coefficient of Determination In Minitab,The regression equation is,Sales = 307 - 25.0 Price + 74.1 Advertising,Predictor Coef SE Coef T P,Constant306.50 114.30 2.68 0.020,Price -24.98 10.83 -2.31 0.040,Advertising 74.13 25.97 2.85 0.014,S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%,Analysis of Variance,Source DF SS MS F P,Regression 2 29460 14730 6.54 0.012,Residual Error12 27033 2253,Total 14 56493,52.1% of the variation in pie sales is explained by the variation in price and advertising,DCOV,A,Multiple Coefficient of Deter,Adjusted r,2,r,2,never decreases when a new X variable is added to the model,This can be a disadvantage when comparing models,What is the net effect of adding a new variable?,We lose a degree of freedom when a new X variable is added,Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?,DCOV,A,Adjusted r2r2 never decreases,Shows the,proportion of variation in Y explained,by all X variables,adjusted for the number of X,variables used,(where n = sample size, k = number of independent variables),Penalizes excessive use of unimportant independent variables,Smaller than r,2,Useful in comparing among models,Adjusted r,2,(continued),DCOV,A,Shows the proportion of variat,Regression Statistics,Multiple R,0.72213,R Square,0.52148,Adjusted R Square,0.44172,Standard Error,47.46341,Observations,15,ANOVA,df,SS,MS,F,Significance F,Regression,2,29460.027,14730.013,6.53861,0.01201,Residual,12,27033.306,2252.776,Total,14,56493.333,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,2.68285,0.01993,57.58835,555.46404,Price,-24.97509,10.83213,-2.30565,0.03979,-48.57626,-1.37392,Advertising,74.13096,25.96732,2.85478,0.01449,17.55303,130.70888,44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables,Adjusted r,2,in Excel,DCOV,A,Regression StatisticsMultiple,Adjusted r,2,in Minitab,The regression equation is,Sales = 307 - 25.0 Price + 74.1 Advertising,Predictor Coef SE Coef T P,Constant306.50 114.30 2.68 0.020,Price -24.98 10.83 -2.31 0.040,Advertising 74.13 25.97 2.85 0.014,S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%,Analysis of Variance,Source DF SS MS F P,Regression 2 29460 14730 6.54 0.012,Residual Error12 27033 2253,Total 14 56493,44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables,DCOV,A,Adjusted r2 in MinitabThe regr,F Test for Overall Significance of the Model,Shows if there is a linear relationship between all of the X variables considered together and Y,Use F-test statistic,Hypotheses:,H,0,:,1,=,2,=,=,k,= 0 (no linear relationship),H,1,: at least one,i, 0 (at least one independent,variable affects Y),Is the Model Significant?,DCOV,A,F Test for Overall Significanc,F Test for Overall Significance,Test statistic:,where F,STAT,has numerator d.f. =,k,and,denominator d.f. =,(n k - 1),DCOV,A,F Test for Overall Significanc,Regression Statistics,Multiple R,0.72213,R Square,0.52148,Adjusted R Square,0.44172,Standard Error,47.46341,Observations,15,ANOVA,df,SS,MS,F,Significance F,Regression,2,29460.027,14730.013,6.53861,0.01201,Residual,12,27033.306,2252.776,Total,14,56493.333,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,2.68285,0.01993,57.58835,555.46404,Price,-24.97509,10.83213,-2.30565,0.03979,-48.57626,-1.37392,Advertising,74.13096,25.96732,2.85478,0.01449,17.55303,130.70888,(continued),F Test for Overall Significance In Excel,With 2 and 12 degrees of freedom,P-value for the F Test,DCOV,A,Regression StatisticsMultiple,F Test for Overall Significance In Minitab,The regression equation is,Sales = 307 - 25.0 Price + 74.1 Advertising,Predictor Coef SE Coef T P,Constant306.50 114.30 2.68 0.020,Price -24.98 10.83 -2.31 0.040,Advertising 74.13 25.97 2.85 0.014,S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%,Analysis of Variance,Source DF SS MS F P,Regression 2 29460 14730 6.54 0.012,Residual Error12 27033 2253,Total 14 56493,With 2 and 12 degrees of freedom,P-value for the F Test,DCOV,A,F Test for Overall Significanc,H,0,:,1,=,2,= 0,H,1,:,1,and,2,not both zero,= .05,df,1,= 2 df,2,= 12,Test Statistic:,Decision:,Conclusion:,Since F,STAT,test statistic is in the rejection region (p-value .05), reject H,0,There is evidence that at least one independent variable affects Y,0, = .05,F,0,.05,= 3.885,Reject H,0,Do not,reject H,0,Critical Value:,F,0.05,= 3.885,F Test for Overall Significance,(continued),F,DCOV,A,H0: 1 = 2 = 0Test Statistic:,Two variable model,Y,X,1,X,2,Y,i,Y,i,x,2i,x,1i,The best fit equation is found by minimizing the sum of squared errors,e,2,Sample observation,Residuals in Multiple Regression,Residual = e,i,= (Y,i, Y,i,),DCO,V,A,Two variable modelYX1X2Yi Yix,Multiple Regression Assumptions,Assumptions,:,The errors are normally distributed,Errors have a constant variance,The model errors are independent,e,i,= (Y,i, Y,i,),Errors (,residuals,) from the regression model:,DCOV,A,Multiple Regression Assumption,Residual Plots Used in Multiple Regression,These residual plots are used in multiple regression:,Residuals vs. Y,i,Residuals vs. X,1i,Residuals vs. X,2i,Residuals vs. time,(if time series data),Use the residual plots to check for violations of regression assumptions,DCOV,A,Residual Plots Used in Multip,Use t tests of individual variable slopes,Shows if there is a linear relationship between the variable X,j,and Y holding constant the effects of other X variables,Hypotheses:,H,0,:,j,= 0 (no linear relationship),H,1,:,j, 0 (linear relationship does exist,between X,j,and Y),Are Individual Variables Significant?,DCOV,A,Use t tests of individual vari,H,0,:,j,= 0 (no linear relationship between X,j,and Y),H,1,:,j, 0 (linear relationship does exist,between X,j,and Y),Test Statistic:,(,df = n k 1),Are Individual Variables Significant?,(continued),DCOV,A,H0: j = 0 (no linear relatio,Regression Statistics,Multiple R,0.72213,R Square,0.52148,Adjusted R Square,0.44172,Standard Error,47.46341,Observations,15,ANOVA,df,SS,MS,F,Significance F,Regression,2,29460.027,14730.013,6.53861,0.01201,Residual,12,27033.306,2252.776,Total,14,56493.333,Coefficients,Standard Error,t Stat,P-value,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,2.68285,0.01993,57.58835,555.46404,Price,-24.97509,10.83213,-2.30565,0.03979,-48.57626,-1.37392,Advertising,74.13096,25.96732,2.85478,0.01449,17.55303,130.70888,t Stat for Price is t,STAT,= -2.306, with p-value .0398,t Stat for Advertising is t,STAT,= 2.855, with p-value .0145,(continued),Are Individual Variables Significant? Excel Output,DCOV,A,Regression StatisticsMultiple,Are Individual Variables Significant? Minitab Output,The regression equation is,Sales = 307 - 25.0 Price + 74.1 Advertising,Predictor Coef SE Coef T P,Constant306.50 114.30 2.68 0.020,Price -24.98 10.83 -2.31 0.040,Advertising 74.13 25.97 2.85 0.014,S = 47.4634 R-Sq = 52.1% R-Sq(adj) = 44.2%,Analysis of Variance,Source DF SS MS F P,Regression 2 29460 14730 6.54 0.012,Residual Error12 27033 2253,Total 14 56493,t Stat for Price is t,STAT,= -2.31, with p-value .040,t Stat for Advertising is t,STAT,= 2.85, with p-value .014,DCOV,A,Are Individual Variables Signi,d.f. = 15-2-1 = 12,= .05,t,/2,= 2.1788,Inferences about the Slope: t,Test Example,H,0,:,j,= 0,H,1,:,j,0,The test statistic for each variable falls in the rejection region (p-values .05),There is evidence that both Price and Advertising affect pie sales at,= .05,From the Excel output:,Reject H,0,for each variable,Decision:,Conclusion:,Reject H,0,Reject H,0,a,/2=.025,-t,/2,Do not reject H,0,0,t,/2,a,/2=.025,-2.1788,2.1788,For Price t,STAT,= -2.306, with p-value .0398,For Advertising t,STAT,= 2.855, with p-value .0145,DCOV,A,d.f. = 15-2-1 = 12Inferences a,Confidence Interval Estimate for the Slope,Confidence interval for the population slope,j,Example:,Form a 95% confidence interval for the effect of changes in price (X,1,) on pie sales:,-24.975 (2.1788)(10.832),So the interval is (-48.576 , -1.374),(This interval does not contain zero, so price has a significant effect on sales),Coefficients,Standard Error,Intercept,306.52619,114.25389,Price,-24.97509,10.83213,Advertising,74.13096,25.96732,where t has,(n k 1) d.f.,Here, t has,(15 2 1) = 12 d.f.,DCOV,A,Confidence Interval Estimate,Confidence Interval Estimate for the Slope,Confidence interval for the population slope,j,Example:,Excel output also reports these interval endpoints:,Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price, holding the effect of advertising constant,Coefficients,Standard Error,Lower 95%,Upper 95%,Intercept,306.52619,114.25389,57.58835,555.46404,Price,-24.97509,10.83213,-48.57626,-1.37392,Advertising,74.13096,25.96732,17.55303,130.70888,(continued),DCOV,A,Confidence Interval Estimate,Using Dummy Variables,A dummy variable is a categorical independent variable with two levels:,yes or no, on or off, male or female,coded as 0 or 1,Assumes the slopes associated with numerical independent variables do not change with the value for the categorical variable,If more than two levels, the number of dummy variables needed is (number of levels - 1),DCOV,A,Using Dummy VariablesA dummy v,Dummy-Variable Example (with 2 Levels),Let:,Y = pie sales,X,1,= price,X,2,= holiday,(X,2,= 1 if a holiday occurred during the week) (X,2,= 0 if there was no holiday that week),DCOV,A,Dummy-Variable Example (with,Same slope,Dummy-Variable Example (with 2 Levels),(continued),X,1,(Price),Y (sales),b,0,+ b,2,b,0,Holiday,No Holiday,Different intercept,Holiday (X,2,= 1),No Holiday (X,2,= 0),If H,0,:,2,= 0 is rejected, then,“Holiday” has a significant effect on pie sales,DCOV,A,Same slopeDummy-Variable Examp,Sales: number of pies sold per week,Price: pie price in $,Holiday:,Interpreting the Dummy Variable Coefficient (with 2 Levels),Example:,1 If a holiday occurred during the week,0 If no holiday occurred,b,2,= 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price,DCOV,A,Sales: number of pies sold per,Interaction Between Independent Variables,Hypothesizes interaction between pairs of X variables,Response to one X variable may vary at different levels of another X variable,Contains two-way cross product terms,DCOV,A,Interaction Between Independen,Effect of Interaction,Given:,Without interaction term, effect of X,1,on Y is measured by,1,With interaction term, effect of X,1,on Y is measured by,1,+,3,X,2,Effect changes as X,2,changes,DCOV,A,Effect of InteractionGiven:DCO,X,2,= 1:,Y = 1 + 2X,1,+ 3(1) + 4X,1,(1) = 4 + 6X,1,X,2,= 0:,Y = 1 + 2X,1,+ 3(0) + 4X,1,(0) = 1 + 2X,1,Interaction Example,Slopes are different if the effect of X,1,on Y depends on X,2,value,X,1,4,8,12,0,0,1,0.5,1.5,Y,= 1 + 2X,1,+ 3X,2,+ 4X,1,X,2,Suppose X,2,is a dummy variable and the estimated regression equation is,DCOV,A,X2 = 1:X2 = 0: Interaction Exa,Chapter Summary,In this chapter we discussed:,How to develop a multiple regression model,How to interpret the regression coefficients,How to determine which independent variables to include in the regression model,How to use categorical independent variables in a regression model,Chapter SummaryIn this chapter,
展开阅读全文