Maximum-Likelihood-Estim课件

上传人:20****08 文档编号:58826806 上传时间:2022-03-01 格式:PPT 页数:57 大小:1.03MB
返回 下载 相关 举报
Maximum-Likelihood-Estim课件_第1页
第1页 / 共57页
Maximum-Likelihood-Estim课件_第2页
第2页 / 共57页
Maximum-Likelihood-Estim课件_第3页
第3页 / 共57页
点击查看更多>>
资源描述
Applied EconometricsWilliam GreeneDepartment of EconomicsStern School of BusinessApplied Econometrics 19. Maximum Likelihood EstimationMaximum Likelihood EstimationThis defines a class of estimators based on the particular distribution assumed to have generated the observed random variable. The main advantage of ML estimators is that among all Consistent Asymptotically Normal Estimators, MLEs have optimal asymptotic properties. The main disadvantage is that they are not necessarily robust to failures of the distributional assumptions. They are very dependent on the particular assumptions.The oft cited disadvantage of their mediocre small sample properties is probably overstated in view of the usual paucity of viable alternatives.Setting up the MLEThe distribution of the observed random variable is written as a function of the parameters to be estimated P(yi|data,) = Probability density | parameters.The likelihood function is constructed from the density Construction: Joint probability density function of the observed sample of data generally the product when the data are a random sample.Regularity ConditionsoWhat they aren1. logf(.) has three continuous derivatives wrt parametersn2. Conditions needed to obtain expectations of derivatives are met. (E.g., range of the variable is not a function of the parameters.)n3. Third derivative has finite expectation.oWhat they meannMoment conditions and convergence. We need to obtain expectations of derivatives.nWe need to be able to truncate Taylor series.nWe will use central limit theoremsThe MLEThe log-likelihood function: log-L( |data)The likelihood equation(s):First derivatives of log-L equal zero at the MLE.(1/n)i logf(yi| )/ MLE = 0. (Sample statistic.) (The 1/n is irrelevant.)“First order conditions” for maximizationA moment condition - its counterpart is the fundamental result Elog-L/ = 0.How do we use this result? An analogy principle.Average Time Until FailureEstimating the average time until failure, , of light bulbs. yi = observed life until failure.f(yi|)=(1/ )exp(-yi/)L()=i f(yi|)= -N exp(-yi/ )logL ()=-Nlog () - yi/Likelihood equation: logL()/=-N/ + yi/2 =0Note, logf(yi|)/ = -1/ + yi/2 Since Eyi= , Elogf()/=0. (Regular)Properties of the Maximum Properties of the Maximum Likelihood EstimatorLikelihood EstimatorWe will sketch formal proofs of these results:The log-likelihood function, againThe likelihood equation and the information matrix.A linear Taylor series approximation to the first order conditions: g( ML) = 0 g( ) + H( ) ( ML - ) (under regularity, higher order terms will vanish in large samples.)Our usual approach. Large sample behavior of the left and right hand sides is the same.A Proof of consistency. (Property 1)The limiting variance of n( ML - ). We are using the central limit theorem here.Leads to asymptotic normality (Property 2). We will derive the asymptotic variance of the MLE.Efficiency (we have not developed the tools to prove this.) The Cramer-Rao lower bound for efficient estimation (an asymptotic version of Gauss-Markov).Estimating the variance of the maximum likelihood estimator.Invariance. (A VERY handy result.) Coupled with the Slutsky theorem and the delta method, the invariance property makes estimation of nonlinear functions of parameters very easy.Testing Hypotheses Testing Hypotheses A Trinity of TestsA Trinity of TestsThe likelihood ratio test:Based on the proposition (Greenes) that restrictions always “make life worse”Is the reduction in the criterion (log-likelihood) large? Leads to the LR test.The Lagrange multiplier test:Underlying basis: Reexamine the first order conditions.Form a test of whether the gradient is significantly “nonzero” at the restricted estimator.The Wald test: The usual.The Linear (Normal) ModelDefinition of the likelihood function - joint density of the observed data, written as a function of the parameters we wish to estimate.Definition of the maximum likelihood estimator as that function of the observed data that maximizes the likelihood function, or its logarithm.For the model:yi = xi + i, where i N0,2, the maximum likelihood estimators of and 2 are b = (X X)-1X y and s2 = e e/n. That is, least squares is ML for the slopes, but the variance estimator makes no degrees of freedom correction, so the MLE is biased.Normal Linear ModelThe log-likelihood function = i log f(yi| ) = sum of logs of densities. For the linear regression model with normally distributed disturbances log-L = i - log2 -log2 - (yi xi )2/2 .Likelihood EquationsThe estimator is defined by the function of the data that equates log-L/ to 0. (Likelihood equation)The derivative vector of the log-likelihood function is the score function. For the regression model,g = log-L/ , log-L/2 = log-L/ = i (1/2)xi(yi - xi ) log-L/2 = i -1/(22) + (yi - xi )2/(24)For the linear regression model, the first derivative vector of log-L is(1/2)X (y - X ) and (1/22) i (yi - xi )2/2 - 1 (K1) (11)Note that we could compute these functions at any and 2. If we compute them at b and e e/n, the functions will be identically zero.Moment EquationsNote that g = i gi is a random vector and that each term in the sum has expectation zero. It follows that E(1/n)g = 0. Our estimator is found by finding the that sets the sample mean of the gs to 0. That is, theoretically, Egi( ,2) = 0. We find the estimator as that function which produces (1/n)i gi(b ,s2) = 0. Note the similarity to the way we would estimate any mean. If Exi = , then Exi - = 0. We estimate by finding the function of the data that produces (1/n)i (xi - m) = 0, which is, of course the sample mean. There are two main components to the “regularity conditions for maximum likelihood estimation. The first is that the first derivative has expected value 0. That moment equation motivates the MLEInformation MatrixThe negative of the second derivatives matrix of the log-likelihood, -H =is called the information matrix. It is usually a random matrix, also. For the linear regression model, 2logfii Hessian for the Linear Model222222222i2ii22ii24iilog Llog Llog L = - log Llog L1(y )1 = 11(y )(y )2 iiiiiiix xxxxxx Note that the off diagonal elements have expectation zero.Estimated Information Matrix (which should look familiar). The off diagonal terms go to zero (one of the assumptions of the model).2i41-E = n2iix x0H0This can be computed at any vector and scalar 2. You can take expected values of the parts of the matrix to getDeriving the Properties of the Deriving the Properties of the Maximum Likelihood EstimatorMaximum Likelihood EstimatorThe MLEConsistency:Consistency:Consistency ProofAsymptotic VarianceAsymptotic Variance Asymptotic VarianceAsymptotic DistributionAsymptotic DistributionOther Results 1 Variance BoundOther Results 1 Variance BoundInvarianceThe maximum likelihood estimator of a function of , say h( ) is h(MLE). This is not always true of other kinds of estimators. To get the variance of this function, we would use the delta method. E.g., the MLE of =(/) is b/(ee/n)Computing the Asymptotic Computing the Asymptotic VarianceVarianceWe want to estimate -EH-1 Three ways:(1) Just compute the negative of the actual second derivatives matrix and invert it.(2) Insert the maximum likelihood estimates into the known expected values of the second derivatives matrix. Sometimes (1) and (2) give the same answer (for example, in the linear regression model).(3) Since EH is the variance of the first derivatives, estimate this with the sample variance (i.e., mean square) of the first derivatives. This will almost always be different from (1) and (2). Since they are estimating the same thing, in large samples, all three will give the same answer. Current practice in econometrics often favors (3). Stata rarely uses (3). Others do.Linear Regression ModelExample: Different Estimators of the Variance of the MLEConsider, again, the gasoline data. We use a simple equation: Gt = 1 + 2Yt + 3Pgt + t.Linear ModelBHHH EstimatorNewtons MethodPoisson RegressionAsymptotic Variance of the MLEEstimators of the Asymptotic Estimators of the Asymptotic Covariance MatrixCovariance MatrixROBUST ESTIMATIONoSandwich EstimatoroH-1 (GG) H-1oIs this appropriate? Why do we do this?Application: Doctor VisitsoGerman Individual Health Care data: N=27,236oModel for number of visits to the doctor:nPoisson regression (fit by maximum likelihood)nIncome, Education, GenderHistogram for Variable DOCVISFrequencyDOCVIS028056084011200123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59Poisson Regression Iterationspoisson ; lhs = docvis ; rhs = one,female,hhninc,educ;mar;output=3$Method=Newton; Maximum iterations=100Convergence criteria: gtHg .1000D-05 chg.F .0000D+00 max|db| .0000D+00Start values: .00000D+00 .00000D+00 .00000D+00 .00000D+001st derivs. -.13214D+06 -.61899D+05 -.43338D+05 -.14596D+07Parameters: .28002D+01 .72374D-01 -.65451D+00 -.47608D-01Itr 2 F= -.1587D+06 gtHg= .2832D+03 chg.F= .1587D+06 max|db|= .1346D+011st derivs. -.33055D+05 -.14401D+05 -.10804D+05 -.36592D+06Parameters: .21404D+01 .16980D+00 -.60181D+00 -.48527D-01Itr 3 F= -.1115D+06 gtHg= .9725D+02 chg.F= .4716D+05 max|db|= .6348D+001st derivs. -.42953D+04 -.15074D+04 -.13927D+04 -.47823D+05Parameters: .17997D+01 .27758D+00 -.54519D+00 -.49513D-01Itr 4 F= -.1063D+06 gtHg= .1545D+02 chg.F= .5162D+04 max|db|= .1437D+001st derivs. -.11692D+03 -.22248D+02 -.37525D+02 -.13159D+04Parameters: .17276D+01 .31746D+00 -.52565D+00 -.49852D-01Itr 5 F= -.1062D+06 gtHg= .5006D+00 chg.F= .1218D+03 max|db|= .6542D-021st derivs. -.12522D+00 -.54690D-02 -.40254D-01 -.14232D+01Parameters: .17249D+01 .31954D+00 -.52476D+00 -.49867D-01Itr 6 F= -.1062D+06 gtHg= .6215D-03 chg.F= .1254D+00 max|db|= .9678D-051st derivs. -.19317D-06 -.94936D-09 -.62872D-07 -.22029D-05Parameters: .17249D+01 .31954D+00 -.52476D+00 -.49867D-01Itr 7 F= -.1062D+06 gtHg= .9957D-09 chg.F= .1941D-06 max|db|= .1602D-10 * ConvergedRegression and Partial Effects+-+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z| Mean of X|+-+-+-+-+-+-+ Constant| 1.72492985 .02000568 86.222 .0000 FEMALE | .31954440 .00696870 45.854 .0000 .47877479 HHNINC | -.52475878 .02197021 -23.885 .0000 .35208362 EDUC | -.04986696 .00172872 -28.846 .0000 11.3206310+-+| Partial derivatives of expected val. with | respect to the vector of characteristics. | Effects are averaged over individuals. | Observations used for means are All Obs. | Conditional Mean at Sample Point 3.1835 | Scale Factor for Marginal Effects 3.1835 |+-+-+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z| Mean of X|+-+-+-+-+-+-+ Constant| 5.49135704 .07890083 69.598 .0000 FEMALE | 1.01727755 .02427607 41.905 .0000 .47877479 HHNINC | -1.67058263 .07312900 -22.844 .0000 .35208362 EDUC | -.15875271 .00579668 -27.387 .0000 11.3206310Comparison of Standard Errors+-+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z| Mean of X|+-+-+-+-+-+-+ Constant| 1.72492985 .02000568 86.222 .0000 FEMALE | .31954440 .00696870 45.854 .0000 .47877479 HHNINC | -.52475878 .02197021 -23.885 .0000 .35208362 EDUC | -.04986696 .00172872 -28.846 .0000 11.3206310BHHH+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z|+-+-+-+-+-+ Constant| 1.72492985 .00677787 254.495 .0000 FEMALE | .31954440 .00217499 146.918 .0000 HHNINC | -.52475878 .00733328 -71.559 .0000 EDUC | -.04986696 .00062283 -80.065 .0000Why are they so different? Model failure. This is a panel. There is autocorrelation.NLS vs. MLENONLINEAR LEAST SQUARES+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z|+-+-+-+-+-+|C0 | 1.70205* .06706974 25.377 .0000 |C1 | .31261* .02228784 14.026 .0000 |C2 | -.57513* .07336253 -7.840 .0000 |C3 | -.04586* .00588216 -7.797 .0000 |+-+-+MAXIMUM LIKELIHOOD+-+-+-+-+-+-+|Variable| Coefficient | Standard Error |b/St.Er.|P|Z|z| Mean of X|+-+-+-+-+-+-+ Constant| 1.72492985 .02000568 86.222 .0000 FEMALE | .31954440 .00696870 45.854 .0000 .47877479 HHNINC | -.52475878 .02197021 -23.885 .0000 .35208362 EDUC | -.04986696 .00172872 -28.846 .0000 11.3206310Testing HypothesesWald tests, using the familiar distance measureLikelihood ratio tests:LogLU= log likelihood without restrictionsLogLR= log likelihood with restrictionsLogLU logLR for any nested restrictions2(LogLU logLR) chi-squared JThe Lagrange multiplier test. Wald test of the hypothesis that the score of the unrestricted log likelihood is zero when evaluated at the restricted estimator.Testing the Model+-+| Poisson Regression | Maximum Likelihood Estimates | Dependent variable DOCVIS | Number of observations 27326 | Iterations completed 7 | Log likelihood function -106215.1 | Log likelihood| Number of parameters 4 | Restricted log likelihood -108662.1 | Log Likelihood with only a| McFadden Pseudo R-squared .0225193 | constant term.| Chi squared 4893.983 | 2*logL logL(0)| Degrees of freedom 3 | | ProbChiSqd value = .0000000 |+-+Likelihood ratio test that all three slopes are zero.Wald Test- MATRIX ; List ; b1 = b(2:4) ; v11 = varb(2:4,2:4) ; B1B1$Matrix B1 Matrix V11has 3 rows and 1 columns. has 3 rows and 3 columns1 1 2 3+- +-1| .31954 1| .4856275D-04 -.4556076D-06 .2169925D-052| -.52476 2| -.4556076D-06 .00048 -.9160558D-053| -.04987 3| .2169925D-05 -.9160558D-05 .2988465D-05Matrix Result has 1 rows and 1 columns. 1 +- 1| 4682.38779 LR statistic was 4893.983Likelihood Ratio Testpoisson ; lhs = docvis ; rhs = one,female,hhninc,educ$calc ; logL1 = logL $poisson ; lhs = docvis ; rhs = one,hhninc,educ$calc ; logL0 = logL $calc ; logLpool = logL $poisson ; forfemale=0 ; lhs = docvis ; rhs = one,hhninc,educ$calc ; logLM = logL $poisson ; forfemale=1 ; lhs = docvis ; rhs = one,hhninc,educ$calc ; logLF = logL $? Chi squared test coefficient on FEMALE. 1 Deg.Fr.calc ; list ; logL1 ; logL0 ; LRM_F = 2*(logL1 - logL0) $? Chi squared test for pooling. 3 Deg.Fr.calc ; list ; logLM ; loglF ; logLpool ; LRGender = 2*(logLM+logLF - logLpool)$LR Test Results- calc ; list ; logL1 ; logL0 ; LRM_F = 2*(logL1 - logL0) $+-+| Listed Calculator Results |+-+ LOGL1 =-106215.144165 LOGL0 =-107277.287979 LRM_F = 2124.287628Calculator: Computed 3 scalar resultscalc ; list ; loglM ; loglF ; logLpool ; LRGender = 2*(logLM+logLF - logLpool)$+-+| Listed Calculator Results |+-+ LOGLM = -51501.135803 LOGLF = -54656.153984 LOGLPOOL=-107277.287979 LRGENDER= 2239.996383LM TestoHypothesis: 3 slopes = 0. MLE with all 3 slopes = 0, = y-bar = exp(1), so MLE of 1 is log(y-bar). Constrained MLEs of other 3 slopes are zero. LM Statistic- calc ; beta1=log(xbr(docvis) $- matrix ; bmle0=beta1/0/0/0 $- create ; lambda0 = exp(xbmle0) ; res0 = docvis - lambda0 $- matrix ; list ; g0 = xres0 ; h0 = xlambda0 x ; lm = g0*g0$Matrix G0 has 4 rows and 1 columns. +- 1| .2664385D-08 2| 7944.94441 3|-1781.12219 4| -.3062440D+05Matrix H0 has 4 rows and 4 columns. +- 1| .8699300D+05 .4165006D+05 .3062881D+05 .9848157D+06 2| .4165006D+05 .4165006D+05 .1434824D+05 .4530019D+06 3| .3062881D+05 .1434824D+05 .1350638D+05 .3561238D+06 4| .9848157D+06 .4530019D+06 .3561238D+06 .1161892D+08Matrix LM has 1 rows and 1 columns. +- 1| 4715.41008 Wald was 4682.38779 LR statistic was 4893.983Applied EconometricsWilliam GreeneDepartment of EconomicsStern School of BusinessApplied Econometrics 20. Aspects of Maximum Likelihood EstimationInvarianceReparameterizing the Log LikelihoodEstimating the Tobit Modelniiii=1iiiLog likelihood for the tobit model for estimation of and :y1logL=(1-d)logd logd1 if y0, 0 if y = 0. Derivatives are very complicated,Hessian is iix x niiii=12iiinightmarish. Consider the Olsen transformation*:=1/ , =- / . (One to one; =1/ , logL=log (1-d)logd logy log (1-d)logd(log(1/2)log2(1/2)y)iiii =-xxxx ni=1niiii 1niiii 1logL(1-d)delogL1de y*Note on the Uniqueness of the MLE in the Tobit Model, Econometrica, 1978.iiixxx
展开阅读全文
相关资源
相关搜索

最新文档


当前位置:首页 > 办公文档 > 教学培训


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!