Lecture 9Simple Linear Regression 第九章 简单线性回归分析

上传人:e****s 文档编号:241653139 上传时间:2024-07-13 格式:PPT 页数:74 大小:597KB
返回 下载 相关 举报
Lecture 9Simple Linear Regression 第九章 简单线性回归分析_第1页
第1页 / 共74页
Lecture 9Simple Linear Regression 第九章 简单线性回归分析_第2页
第2页 / 共74页
Lecture 9Simple Linear Regression 第九章 简单线性回归分析_第3页
第3页 / 共74页
点击查看更多>>
资源描述
Chapter12SimpleLinearRegressionBusinessStatistics:AFirstCourseFifthEdition1LearningObjectivesInthischapter,youlearn:nHowtouseregressionanalysistopredictthevalueofadependentvariablebasedonanindependentvariablenThemeaningoftheregressioncoefficientsb0andb1nHowtoevaluatetheassumptionsofregressionanalysisandknowwhattodoiftheassumptionsareviolatednTomakeinferencesabouttheslopeandcorrelationcoefficientnToestimatemeanvaluesandpredictindividualvalues2Correlationvs.RegressionnAscatterplotcanbeusedtoshowtherelationshipbetweentwovariablesnCorrelationanalysisisusedtomeasurethestrengthoftheassociation(linearrelationship)betweentwovariablesnCorrelationisonlyconcernedwithstrengthoftherelationshipnNocausaleffectisimpliedwithcorrelationnScatterplotswerefirstpresentedinCh.2nCorrelationwasfirstpresentedinCh.33IntroductiontoRegressionAnalysisnRegressionanalysisisusedto:nPredictthevalueofadependentvariablebasedonthevalueofatleastoneindependentvariablenExplaintheimpactofchangesinanindependentvariableonthedependentvariableDependentvariable:thevariablewewishtopredictorexplainIndependentvariable:thevariableusedtopredictorexplainthedependentvariable4SimpleLinearRegressionModelnOnlyoneindependentvariable,XnRelationshipbetweenXandYisdescribedbyalinearfunctionnChangesinYareassumedtoberelatedtochangesinX5TypesofRelationshipsYXYXYYXXLinearrelationshipsCurvilinearrelationships6TypesofRelationshipsYXYXYYXXStrongrelationshipsWeakrelationships(continued)7TypesofRelationshipsYXYXNorelationship(continued)8LinearcomponentSimpleLinearRegressionModelPopulationYinterceptPopulationSlopeCoefficientRandomErrortermDependentVariableIndependentVariableRandomErrorcomponent9(continued)RandomErrorforthisXivalueYXObservedValueofYforXiPredictedValueofYforXiXiSlope=1Intercept=0iSimpleLinearRegressionModel10ThesimplelinearregressionequationprovidesanestimateofthepopulationregressionlineSimpleLinearRegressionEquation(PredictionLine)EstimateoftheregressioninterceptEstimateoftheregressionslopeEstimated(orpredicted)YvalueforobservationiValueofXforobservationi11TheLeastSquaresMethodb0andb1areobtainedbyfindingthevaluesofthatminimizethesumofthesquareddifferencesbetweenYand:12FindingtheLeastSquaresEquationnThecoefficientsb0andb1,andotherregressionresultsinthischapter,willbefoundusingExcelorMinitabFormulasareshowninthetextforthosewhoareinterested13nb0istheestimatedmeanvalueofYwhenthevalueofXiszeronb1istheestimatedchangeinthemeanvalueofYasaresultofaone-unitchangeinXInterpretationoftheSlopeandtheIntercept14SimpleLinearRegressionExamplenArealestateagentwishestoexaminetherelationshipbetweenthesellingpriceofahomeanditssize(measuredinsquarefeet)nArandomsampleof10housesisselectednDependentvariable(Y)=housepricein$1000snIndependentvariable(X)=squarefeet15SimpleLinearRegressionExample:DataHousePricein$1000s(Y)SquareFeet(X)245140031216002791700308187519911002191550405235032424503191425255170016SimpleLinearRegressionExample:ScatterPlotHousepricemodel:ScatterPlot17SimpleLinearRegressionExample:UsingExcel18SimpleLinearRegressionExample:ExcelOutputRegressionStatisticsMultipleR0.76211RSquare0.58082AdjustedRSquare0.52842StandardError41.33032Observations10ANOVAdfSSMSFSignificanceFRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept98.2483358.033481.692960.12892-35.57720232.07386SquareFeet0.109770.032973.329380.010390.033740.18580Theregressionequationis:19SimpleLinearRegressionExample:MinitabOutputTheregressionequationisPrice=98.2+0.110SquareFeetPredictorCoefSECoefTPConstant98.2558.031.690.129SquareFeet0.109770.032973.330.010S=41.3303R-Sq=58.1%R-Sq(adj)=52.8%AnalysisofVarianceSourceDFSSMSFPRegression1189351893511.080.010ResidualError8136661708Total932600Theregressionequationis:houseprice=98.24833+0.10977(squarefeet)20SimpleLinearRegressionExample:GraphicalRepresentationHousepricemodel:ScatterPlotandPredictionLineSlope=0.10977Intercept=98.24821SimpleLinearRegressionExample:Interpretationofbonb0istheestimatedmeanvalueofYwhenthevalueofXiszero(ifX=0isintherangeofobservedXvalues)nBecauseahousecannothaveasquarefootageof0,b0hasnopracticalapplication22SimpleLinearRegressionExample:Interpretingb1nb1estimatesthechangeinthemeanvalueofYasaresultofaone-unitincreaseinXnHere,b1=0.10977tellsusthatthemeanvalueofahouseincreasesby0.10977($1000)=$109.77,onaverage,foreachadditionalonesquarefootofsize23Predictthepriceforahousewith2000squarefeet:Thepredictedpriceforahousewith2000squarefeetis317.85($1,000s)=$317,850SimpleLinearRegressionExample:MakingPredictions24SimpleLinearRegressionExample:MakingPredictionsnWhenusingaregressionmodelforprediction,onlypredictwithintherelevantrangeofdataRelevantrangeforinterpolationDonottrytoextrapolatebeyondtherangeofobservedXs25MeasuresofVariationnTotalvariationismadeupoftwoparts:TotalSumofSquaresRegressionSumofSquaresErrorSumofSquareswhere:=MeanvalueofthedependentvariableYi=Observedvalueofthedependentvariable=PredictedvalueofYforthegivenXivalue26nSST=totalsumofsquares(TotalVariation)nMeasuresthevariationoftheYivaluesaroundtheirmeanYnSSR=regressionsumofsquares(ExplainedVariation)nVariationattributabletotherelationshipbetweenXandYnSSE=errorsumofsquares(UnexplainedVariation)nVariationinYattributabletofactorsotherthanX(continued)MeasuresofVariation27(continued)XiYXYiSST=(Yi-Y)2SSE=(Yi-Yi)2 SSR=(Yi-Y)2 _Y YY_Y MeasuresofVariation28nThecoefficientofdeterminationistheportionofthetotalvariationinthedependentvariablethatisexplainedbyvariationintheindependentvariablenThecoefficientofdeterminationisalsocalledr-squaredandisdenotedasr2CoefficientofDetermination,r2note:29r2=1Examplesofr2ValuesYXYXr2=1r2=1PerfectlinearrelationshipbetweenXandY:100%ofthevariationinYisexplainedbyvariationinX30Examplesofr2ValuesYXYX0r21WeakerlinearrelationshipsbetweenXandY:SomebutnotallofthevariationinYisexplainedbyvariationinX31Examplesofr2Valuesr2=0NolinearrelationshipbetweenXandY:ThevalueofYdoesnotdependonX.(NoneofthevariationinYisexplainedbyvariationinX)YXr2=032SimpleLinearRegressionExample:CoefficientofDetermination,r2inExcelRegressionStatisticsMultipleR0.76211RSquare0.58082AdjustedRSquare0.52842StandardError41.33032Observations10ANOVAdfSSMSFSignificanceFRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept98.2483358.033481.692960.12892-35.57720232.07386SquareFeet0.109770.032973.329380.010390.033740.1858058.08%ofthevariationinhousepricesisexplainedbyvariationinsquarefeet33SimpleLinearRegressionExample:CoefficientofDetermination,r2inMinitabTheregressionequationisPrice=98.2+0.110SquareFeetPredictorCoefSECoefTPConstant98.2558.031.690.129SquareFeet0.109770.032973.330.010S=41.3303R-Sq=58.1%R-Sq(adj)=52.8%AnalysisofVarianceSourceDFSSMSFPRegression1189351893511.080.010ResidualError8136661708Total93260058.08%ofthevariationinhousepricesisexplainedbyvariationinsquarefeet34StandardErrorofEstimatenThestandarddeviationofthevariationofobservationsaroundtheregressionlineisestimatedbyWhereSSE=errorsumofsquaresn=samplesize35SimpleLinearRegressionExample:StandardErrorofEstimateinExcelRegressionStatisticsMultipleR0.76211RSquare0.58082AdjustedRSquare0.52842StandardError41.33032Observations10ANOVAdfSSMSFSignificanceFRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept98.2483358.033481.692960.12892-35.57720232.07386SquareFeet0.109770.032973.329380.010390.033740.1858036SimpleLinearRegressionExample:StandardErrorofEstimateinMinitabTheregressionequationisPrice=98.2+0.110SquareFeetPredictorCoefSECoefTPConstant98.2558.031.690.129SquareFeet0.109770.032973.330.010S=41.3303R-Sq=58.1%R-Sq(adj)=52.8%AnalysisofVarianceSourceDFSSMSFPRegression1189351893511.080.010ResidualError8136661708Total93260037ComparingStandardErrorsYYXXSYXisameasureofthevariationofobservedYvaluesfromtheregressionlineThemagnitudeofSYXshouldalwaysbejudgedrelativetothesizeoftheYvaluesinthesampledatai.e.,SYX=$41.33Kismoderatelysmallrelativetohousepricesinthe$200K-$400Krange38nLinearitynTherelationshipbetweenXandYislinearnIndependenceofErrorsnErrorvaluesarestatisticallyindependentnNormalityofErrornErrorvaluesarenormallydistributedforanygivenvalueofXnEqualVariance(alsocalledhomoscedasticity)nTheprobabilitydistributionoftheerrorshasconstantvariance39ResidualAnalysisnTheresidualforobservationi,ei,isthedifferencebetweenitsobservedandpredictedvaluenChecktheassumptionsofregressionbyexaminingtheresidualsnExamineforlinearityassumptionnEvaluateindependenceassumptionnEvaluatenormaldistributionassumptionnExamineforconstantvarianceforalllevelsofX(homoscedasticity)nGraphicalAnalysisofResidualsnCanplotresidualsvs.X40ResidualAnalysisforLinearityNotLinearLinearxresidualsxYxYxresiduals41ResidualAnalysisforIndependenceNotIndependentIndependentXXresidualsresidualsXresiduals42CheckingforNormalitynExaminetheStem-and-LeafDisplayoftheResidualsnExaminetheBoxplotoftheResidualsnExaminetheHistogramoftheResidualsnConstructaNormalProbabilityPlotoftheResiduals43ResidualAnalysisforNormalityPercentResidualWhenusinganormalprobabilityplot,normalerrorswillapproximatelydisplayinastraightline-3-2-10123010044ResidualAnalysisforEqualVarianceNon-constantvarianceConstantvariancexxYxxYresidualsresiduals45SimpleLinearRegressionExample:ExcelResidualOutputRESIDUALOUTPUTPredicted House Price Residuals1251.92316-6.9231622273.8767138.123293284.85348-5.8534844304.062843.9371625218.99284-19.992846268.38832-49.388327356.2025148.797498367.17929-43.179299254.667464.3326410284.85348-29.85348Doesnotappeartoviolateanyregressionassumptions46InferencesAbouttheSlopenThestandarderroroftheregressionslopecoefficient(b1)isestimatedbywhere:=Estimateofthestandarderroroftheslope=Standarderroroftheestimate47InferencesAbouttheSlope:tTestnttestforapopulationslopenIstherealinearrelationshipbetweenXandY?nNullandalternativehypothesesnH0:1=0(nolinearrelationship)nH1:10(linearrelationshipdoesexist)nTeststatisticwhere:b1=regression slope coefficient 1=hypothesized slope Sb1=standard error of the slope48InferencesAbouttheSlope:tTestExampleHousePricein$1000s(y)SquareFeet(x)2451400312160027917003081875199110021915504052350324245031914252551700Estimated Regression Equation:The slope of this model is 0.1098 Is there a relationship between the square footage of the house and its sales price?49InferencesAbouttheSlope:tTestExampleH0:1=0H1:10FromExceloutput:CoefficientsStandardErrortStatP-valueIntercept98.2483358.033481.692960.12892SquareFeet0.109770.032973.329380.01039b1PredictorCoefSECoefTPConstant98.2558.031.690.129SquareFeet0.109770.032973.330.010FromMinitaboutput:b150InferencesAbouttheSlope:tTestExampleTest Statistic:tSTAT=3.329There is sufficient evidence that square footage affects house priceDecision:Reject H0RejectH0RejectH0/2=.025-t/2DonotrejectH00t/2/2=.025-2.30602.30603.329d.f.=10-2=8H0:1=0H1:1051InferencesAbouttheSlope:tTestExampleH0:1=0H1:10FromExceloutput:CoefficientsStandardErrortStatP-valueIntercept98.2483358.033481.692960.12892SquareFeet0.109770.032973.329380.01039p-valueThere is sufficient evidence that square footage affects house price.Decision:Reject H0,since p-value PredictorCoefSECoefTPConstant98.2558.031.690.129SquareFeet0.109770.032973.330.010FromMinitaboutput:52FTestforSignificancenFTeststatistic:wherewhereFSTATfollowsanFdistributionwith1numeratorand(n2)denominatordegreesoffreedom53F-TestforSignificanceExcelOutputRegressionStatisticsMultipleR0.76211RSquare0.58082AdjustedRSquare0.52842StandardError41.33032Observations10ANOVAdfSSMSFSignificanceFRegression118934.934818934.934811.08480.01039Residual813665.56521708.1957Total932600.5000With1and8degreesoffreedomp-valuefortheF-Test54F-TestforSignificanceMinitabOutputAnalysisofVarianceSourceDFSSMSFPRegression1189351893511.080.010ResidualError8136661708Total932600With1and8degreesoffreedomp-valuefortheF-Test55H0:1=0H1:10=.05df1=1df2=8TestStatistic:Decision:Conclusion:RejectH0at=0.05Thereissufficientevidencethathousesizeaffectssellingprice0=.05F.05=5.32RejectH0DonotrejectH0CriticalValue:F=5.32FTestforSignificance(continued)F56ConfidenceIntervalEstimatefortheSlopeConfidenceIntervalEstimateoftheSlope:ExcelPrintoutforHousePrices:At95%levelofconfidence,theconfidenceintervalfortheslopeis(0.0337,0.1858)CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept98.2483358.033481.692960.12892-35.57720232.07386SquareFeet0.109770.032973.329380.010390.033740.18580d.f.=n-257Sincetheunitsofthehousepricevariableis$1000s,weare95%confidentthattheaverageimpactonsalespriceisbetween$33.74and$185.80persquarefootofhousesizeCoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept98.2483358.033481.692960.12892-35.57720232.07386SquareFeet0.109770.032973.329380.010390.033740.18580This95%confidenceintervaldoesnotinclude0.Conclusion:Thereisasignificantrelationshipbetweenhousepriceandsquarefeetatthe.05levelofsignificanceConfidenceIntervalEstimatefortheSlope(continued)58tTestforaCorrelationCoefficientnHypothesesH0:=0(nocorrelationbetweenXandY)H1:0(correlationexists)nTeststatistic(withn2degreesoffreedom)59t-testForACorrelationCoefficientIsthereevidenceofalinearrelationshipbetweensquarefeetandhousepriceatthe.05levelofsignificance?H0:=0(Nocorrelation)H1:0(correlationexists)=.05,df=10-2=8(continued)60t-testForACorrelationCoefficientConclusion:Thereisevidenceofalinearassociationatthe5%levelofsignificanceDecision:RejectH0RejectH0RejectH0/2=.025-t/2DonotrejectH00t/2/2=.025-2.30602.30603.329d.f.=10-2=8(continued)61EstimatingMeanValuesandPredictingIndividualValuesYXXiY=b0+b1Xi ConfidenceIntervalforthemeanofY,givenXiPredictionIntervalforanindividualY,givenXiGoal:FormintervalsaroundYtoexpressuncertaintyaboutthevalueofYforagivenXiY 62ConfidenceIntervalfortheAverageY,GivenXConfidenceintervalestimateforthemeanvalueofYgivenaparticularXiSizeofintervalvariesaccordingtodistanceawayfrommean,X63PredictionIntervalforanIndividualY,GivenXPredictionintervalestimateforanIndividualvalueofYgivenaparticularXiThisextratermaddstotheintervalwidthtoreflecttheaddeduncertaintyforanindividualcase64EstimationofMeanValues:ExampleFindthe95%confidenceintervalforthemeanpriceof2,000square-foothousesPredictedPriceYi=317.85($1,000s)ConfidenceIntervalEstimateforY|X=XTheconfidenceintervalendpointsare280.66and354.90,orfrom$280,660to$354,900i65EstimationofIndividualValues:ExampleFindthe95%predictionintervalforanindividualhousewith2,000squarefeetPredictedPriceYi=317.85($1,000s)PredictionIntervalEstimateforYX=XThepredictionintervalendpointsare215.50and420.07,orfrom$215,500to$420,070i66FindingConfidenceandPredictionIntervalsinExcelnFromExcel,usenPHStat|regression|simplelinearregressionnCheckthen“confidenceandpredictionintervalforX=nboxandentertheX-valueandconfidenceleveldesired67InputvaluesFindingConfidenceandPredictionIntervalsinExcel(continued)ConfidenceIntervalEstimateforY|X=XiPredictionIntervalEstimateforYX=XiY 68FindingConfidenceandPredictionIntervalsinMinitabPredictedValuesforNewObservationsNewObsFitSEFit95%CI95%PI1317.816.1(280.7,354.9)(215.5,420.1)ValuesofPredictorsforNewObservationsNewSquareObsFeet12000Y InputvaluesConfidenceIntervalEstimateforY|X=XiPredictionIntervalEstimateforYX=Xi69PitfallsofRegressionAnalysisnLackinganawarenessoftheassumptionsunderlyingleast-squaresregressionnNotknowinghowtoevaluatetheassumptionsnNotknowingthealternativestoleast-squaresregressionifaparticularassumptionisviolatednUsingaregressionmodelwithoutknowledgeofthesubjectmatternExtrapolatingoutsidetherelevantrange70StrategiesforAvoidingthePitfallsofRegressionnStartwithascatterplotofXvs.YtoobservepossiblerelationshipnPerformresidualanalysistochecktheassumptionsnPlottheresidualsvs.XtocheckforviolationsofassumptionssuchashomoscedasticitynUseahistogram,stem-and-leafdisplay,boxplot,ornormalprobabilityplotoftheresidualstouncoverpossiblenon-normality71StrategiesforAvoidingthePitfallsofRegressionnIfthereisviolationofanyassumption,usealternativemethodsormodelsnIfthereisnoevidenceofassumptionviolation,thentestforthesignificanceoftheregressioncoefficientsandconstructconfidenceintervalsandpredictionintervalsnAvoidmakingpredictionsorforecastsoutsidetherelevantrange(continued)72ChapterSummarynIntroducedtypesofregressionmodelsnReviewedassumptionsofregressionandcorrelationnDiscusseddeterminingthesimplelinearregressionequationnDescribedmeasuresofvariationnDiscussedresidualanalysis73ChapterSummarynDescribedinferenceabouttheslopenDiscussedcorrelation-measuringthestrengthoftheassociationnAddressedestimationofmeanvaluesandpredictionofindividualvaluesnDiscussedpossiblepitfallsinregressionandrecommendedstrategiestoavoidthem(continued)74
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业管理 > 商业计划


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!