MultinomialLogit

上传人:xx****x 文档编号:242869799 上传时间:2024-09-10 格式:PPT 页数:48 大小:331.50KB
返回 下载 相关 举报
MultinomialLogit_第1页
第1页 / 共48页
MultinomialLogit_第2页
第2页 / 共48页
MultinomialLogit_第3页
第3页 / 共48页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Multinomial Logit,Sociology 229: Advanced Regression,Copyright 2010 by Evan Schofer,Do not copy or distribute without permission,1,Announcements,Short assignment 1 handed out today,Due at start of class next week,2,Agenda,Minor follow-up to last class:,Marginal change in logistic regression,Models for “polytomous” outcomes,Ordered logistic regression,Multinomial logistic regression,Conditional logit: models for alternative-specific data,3,Marginal Change in Logit,Issue: How to best capture effect size in non-linear models?,% Change in odds ratios for 1-unit change in X,Change in actual probability for 1-unit change in X,Either for hypothetical cases or an actual case,Another option: marginal change,The actual slope of the curve at a specific point,Again, can be computed for real or hypothetical cases,Use “adjust” (stata 9/10) or “margins” (stata 11),Recall from calculus: derivatives are slopes.,So, a marginal change is just a derivative.,4,Marginal vs Discrete Change in Logit,Long and Freese 2006:169,5,Ordered Logit: Motivation,Issue: Many categorical dependent variables are ordered,Ex: strongly disagree, disagree, agree, strongly agree,Ex: social class,Linear regression is often used for ordered categorical outcomes,Ex: Strongly disagree=0, disagree=1, agree=2, etc.,This makes arbitrary usually unjustifiable assumptions about the distance between categories,Why not: Strongly disagree=0, disagree=3, agree=3.5?,If numerical values assigned to categories do not accurately reflect the true distance, linear regression may not be appropriate,6,Ordered Logit: Motivation,Strategies to deal with ordered categorical variables,1. Use OLS regression anyway,Commonly done; but can give incorrect results,Possibly check robustness by varying coding of interval between outcomes,2. Collapse variables to dichotomy, use a binary model such as logit or probit,Combine “strongly disagree” & “disagree”, “strongly agree” & “agree”,Model “disagree” vs. “agree”,Works fine, but “throws away” useful information.,7,Ordered Logit: Motivation,Strategies to deal with ordered categorical variables (contd):,3. If you arent confident about ordering, use multinomial logistic regression (discussed later),4. Ordered logit / ordinal probit,5. Stereotype logit,Not discussed.,8,Ordered Logit,Ordered logit is often conceptualized as a latent variable model,Observed responses result from individuals falling within ranges on an underlying continuous measure,Example: There is some underlying variable “agreement”,If you fall below a certain (unobserved) threshold, youll respond “strongly disagree”,Whereas logit looks at P(Y=1), ologit looks at probability of falling in particular ranges,9,Ordered Logit Example: Environment Spending,Government spending on the environment,GSS question: Are we spending too little money, about the right amount, too much?,GSS variable “NATENVIR” from years 2000, 02, 04, 06,Recoded: 1 = too little, 2 = about right, 3 = too much,10,Ordered logit Example,Government spending on environment,. ologit envspend educ incomea female age dblack class city suburb attendchurch,Ordered logistic regression Number of obs = 5169,LR chi2(9) = 192.88,Prob chi2 = 0.0000,Log likelihood = -4191.1232 Pseudo R2 = 0.0225,-,envspend | Coef. Std. Err. z P|z| 95% Conf. Interval,-+-,educ | .0419784 .0108409 3.87 0.000 .0207307 .0632261,income | .0023984 .0057545 0.42 0.677 -.0088802 .013677,female | .2753095 .0591542 4.65 0.000 .1593693 .3912496,age | -.012762 .0017667 -7.22 0.000 -.0162247 -.0092994,dblack | .2898025 .0930178 3.12 0.002 .1074911 .472114,class | -.0719344 .0485173 -1.48 0.138 -.1670266 .0231578,city | .227895 .080983 2.81 0.005 .0691711 .3866188,suburb | .0752643 .0695921 1.08 0.279 -.0611337 .2116624,attendchurch | -.086372 .0109998 -7.85 0.000 -.1079312 -.0648128,-+-,/cut1 | -2.872315 .1930206 -3.250628 -2.494001,/cut2 | -.8156047 .1867621 -1.181652 -.4495577,Instead of a constant, ologit indicates “cutpoints”, which can be used to compute probabilities of falling into a particular value of Y,11,Ordered logit Example,Ologit results can be shown as odds ratios,. ologit envspend educ incomea female age dblack class city suburb attendchur, or,Ordered logistic regression Number of obs = 5169,LR chi2(9) = 192.88,Prob chi2 = 0.0000,Log likelihood = -4191.1232 Pseudo R2 = 0.0225,-,envspend | Odds Ratio Std. Err. z P|z| 95% Conf. Interval,-+-,educ | 1.042872 .0113056 3.87 0.000 1.020947 1.065268,incomea | 1.002401 .0057683 0.42 0.677 .9911591 1.013771,female | 1.316938 .0779025 4.65 0.000 1.172771 1.478828,age | .987319 .0017443 -7.22 0.000 .9839063 .9907437,dblack | 1.336164 .124287 3.12 0.002 1.113481 1.60338,class | .930592 .0451498 -1.48 0.138 .8461771 1.023428,city | 1.255953 .1017109 2.81 0.005 1.07162 1.471995,suburb | 1.078169 .0750321 1.08 0.279 .9406974 1.235731,attend | .9172529 .0100896 -7.85 0.000 .8976894 .9372429,-+-,/cut1 | -2.872315 .1930206 -3.250628 -2.494001,/cut2 | -.8156047 .1867621 -1.181652 -.4495577,Women have 1.32 times the odds of falling in a higher category than men a difference of (1-1.31)*100 = 32%.,12,Proportional Odds Assumption,The fact that you can calculate odds ratios highlights a key assumption of ordered logit:,“Proportional odds assumption”,Also known as the “parallel regression assumption”,Which also applies to ordered probit,Model assumes that variable effects on the odds of lower vs. higher outcomes are consistent,Effect on odds of “too little” vs “about right” is same for “about right” vs “too much”,Controlling for all other vars in the model,If this assumption doesnt seem reasonable, consider stereotype logit or multinomial logit.,13,Ologit Interpretation,Like logit, interpretation is difficult because effect of Xs on Y is nonlinear,Effects vary with values of all X variables,Interpretation strategies are similar to logit:,You can produce predicted probabilities,For each category of Y: Y= 1, Y=2, Y=3,For real or hypothetical cases,You can look at effect of change in X on predicted probabilities of Y,Given particular values of X variables,You can present marginal effects.,14,Ordered logit vs. OLS,Government spending on environment,. reg envspend educ incomea female age dblack class city suburb attendchur,Source | SS df MS Number of obs = 5169,-+- F( 9, 5159) = 21.27,Model | 71.1243142 9 7.90270158 Prob F = 0.0000,Residual | 1916.7124 5159 .371527894 R-squared = 0.0358,-+- Adj R-squared = 0.0341,Total | 1987.83672 5168 .384643328 Root MSE = .60953,-,envspend | Coef. Std. Err. t P|t| 95% Conf. Interval,-+-,educ | .012701 .0032069 3.96 0.000 .0064141 .0189878,income | .0006037 .0016821 0.36 0.720 -.002694 .0039013,female | .0900251 .0173081 5.20 0.000 .0560938 .1239563,age | -.0038736 .0005258 -7.37 0.000 -.0049044 -.0028428,dblack | .0726494 .0261632 2.78 0.006 .0213585 .1239403,class | -.0165553 .0142495 -1.16 0.245 -.0444904 .0113797,city | .0555329 .0229917 2.42 0.016 .0104594 .1006065,suburb | .031217 .0205407 1.52 0.129 -.0090515 .0714855,attendchur | -.0243782 .0032213 -7.57 0.000 -.0306934 -.0180631,_cons | 2.618234 .0547459 47.83 0.000 2.510909 2.72556,In this case, OLS produced similar results to ordered logit. But, that doesnt always happen and you wont know if you dont check.,15,Multinomial Logistic Regression,What if you want have a dependent variable has several non-ordinal outcomes?,Ex: Mullen, Goyette, Soares (2003): What kind of grad school?,None vs. MA vs MBA vs Profl School vs PhD.,Ex: McVeigh & Smith (1999). Political action,Action can take different forms: institutionalized action (e.g., voting) or protest,Inactive vs. conventional pol action vs. protest,Other examples?,16,Multinomial Logistic Regression,Multinomial Logit strategy: Contrast outcomes with a common “reference point”,Similar to conducting a series of 2-outcome logit models comparing pairs of categories,The “reference category” is like the reference group when using dummy variables in regression,It serves as the contrast point for all analyses,Example: Mullen et al. 2003: Analysis of 5 categories yields 4 tables of results:,No grad school vs. MA,No grad school vs. MBA,No grad school vs. Profl school,No grad school vs. PhD.,17,Multinomial Logistic Regression,Imagine a dependent variable with M categories,Ex: 2000 Presidential Election:,j = 3; Voting for Bush, Gore, or Nader,Probability of person “i” choosing category “j” must add to 1.0:,18,Multinomial Logistic Regression,Option #1: Conduct binomial logit models for all possible combinations of outcomes,Probability of Gore vs. Bush,Probability of Nader vs. Bush,Probability of Gore vs. Nader,Note: This will produce results fairly similar to a multinomial output,But: Sample varies across models,Also, multinomial imposes additional constraints,So, results will differ somewhat from multinomial logistic regression.,19,Multinomial Logistic Regression,We can model probability of each outcome as:,i = cases, j categories, k = independent variables,Solved by adding constraint,Coefficients sum to zero,20,Multinomial Logistic Regression,Option #2: Multinomial logistic regression,Choose one category as “reference”,Probability of Gore vs. Bush,Probability of Nader vs. Bush,Probability of Gore vs. Nader,Lets make Bush the reference category,Output will include two tables:,Factors affecting probability of voting for Gore vs. Bush,Factors affecting probability of Nader vs. Bush.,21,Multinomial Logistic Regression,Choice of “reference” category drives interpretation of multinomial logit results,Similar to when you use dummy variables,Example: Variables affecting vote for Gore would change if reference was Bush or Nader!,What would matter in each case?,1. Choose the contrast(s) that makes most sense,Try out different possible contrasts,2. Be aware of the reference category when interpreting results,Otherwise, you can make BIG mistakes,Effects are,always,in reference to the contrast category.,22,MLogit Example: Family Vacation,Mode of Travel. Reference category = Train,. mlogit mode income familysize,Multinomial logistic regression Number of obs = 152,LR chi2(4) = 42.63,Prob chi2 = 0.0000,Log likelihood = -138.68742 Pseudo R2 = 0.1332,-,mode | Coef. Std. Err. z P|z| 95% Conf. Interval,-+-,Bus |,income | .0311874 .0141811 2.20 0.028 .0033929 .0589818,family size | -.6731862 .3312153 -2.03 0.042 -1.322356 -.0240161,_cons | -.5659882 .580605 -0.97 0.330 -1.703953 .5719767,-+-,Car |,income | .057199 .0125151 4.57 0.000 .0326698 .0817282,family size | .1978772 .1989113 0.99 0.320 -.1919817 .5877361,_cons | -2.272809 .5201972 -4.37 0.000 -3.292377 -1.253241,-,(mode=Train is the base outcome),Large families less likely to take bus (vs. train),Note: It is hard to directly compare Car vs. Bus in this table,23,MLogit Example: Car vs. Bus vs. Train,Mode of Travel. Reference category = Car,. mlogit mode income familysize, base(3),Multinomial logistic regression Number of obs = 152,LR chi2(4) = 42.63,Prob chi2 = 0.0000,Log likelihood = -138.68742 Pseudo R2 = 0.1332,-,mode | Coef. Std. Err. z P|z| 95% Conf. Interval,-+-,Train |,income | -.057199 .0125151 -4.57 0.000 -.0817282 -.0326698,family size | -.1978772 .1989113 -0.99 0.320 -.5877361 .1919817,_cons | 2.272809 .5201972 4.37 0.000 1.253241 3.292377,-+-,Bus |,income | -.0260117 .0139822 -1.86 0.063 -.0534164 .001393,family size | -.8710634 .3275472 -2.66 0.008 -1.513044 -.2290827,_cons | 1.706821 .6464476 2.64 0.008 .439807 2.973835,-,(mode=Car is the base outcome),Here, the pattern is clearer: Wealthy & large families use cars,24,Stata Notes: mlogit,Dependent variable: any categorical variable,Dont need to be positive or sequential,Ex: Bus = 1, Train = 2, Car = 3,Or: Bus = 0, Train = 10, Car = 35,Base category can be set with option:,mlogit mode income familysize, baseoutcome(3),Exponentiated coefficients called “relative risk ratios”, rather than odds ratios,mlogit mode income familysize, rrr,25,MLogit Example: Car vs. Bus vs. Train,Exponentiated coefficients: relative risk ratios,Multinomial logistic regression Number of obs = 152,LR chi2(4) = 42.63,Prob chi2 = 0.0000,Log likelihood = -138.68742 Pseudo R2 = 0.1332,-,mode | RRR Std. Err. z P|z| 95% Conf. Interval,-+-,Train |,income | .9444061 .0118194 -4.57 0.000 .9215224 .9678581,familysize | .8204706 .1632009 -0.99 0.320 .5555836 1.211648,-+-,Bus |,income | .9743237 .0136232 -1.86 0.063 .9479852 1.001394,familysize | .4185063 .1370806 -2.66 0.008 .2202385 .7952627,-,(mode=Car is the base outcome),exp(-.057)=.94. Interpretation is just like odds ratios BUT comparison is with reference category.,26,Predicted Probabilities,You can predict probabilities for each case,Each outcome has its own probability (they add up to 1),. predict predtrain predbus predcar if e(sample), pr,. list predtrain predbus predcar,+-+,| predtrain predbus predcar |,|-|,1. | .3581157 .3089684 .3329159 |,2. | .448882 .1690205 .3820975 |,3. | .3080929 .3106668 .3812403 |,4. | .0840841 .0562263 .8596895 |,5. | .2771111 .1665822 .5563067 |,6. | .5169058 .279341 .2037531 |,7. | .5986157 .2520666 .1493177 |,8. | .3080929 .3106668 .3812403 |,9. | .0934616 .1225238 .7840146 |,10. | .6262593 .1477046 .2260361 |,This case has a high predicted probability of traveling by car,This probabilities are pretty similar here,27,Classification of Cases,Stata doesnt have a fancy command to compute classification tables for mlogit,But, you can do it manually,Assign cases based on highest probability,You can make table of all classifications, or just if they were classified correctly,. gen predcorrect = 0,. replace predcorrect = 1 if pmode = mode,(85 real changes made),. tab predcorrect,predcorrect | Freq. Percent Cum.,-+-,0 | 67 44.08 44.08,1 | 85 55.92 100.00,-+-,Total | 152 100.00,First, I calculated the “predicted mode” and a dummy indicating whether prediction was correct,56% of cases were classified correctly,28,Predicted Probability Across X Vars,Like logit, you can show how probabilies change across independent variables,However, “adjust” command doesnt work with mlogit,So, manually compute mean of predicted probabilities,Note: Other variables will be left “as is” unless you set them manually before you use “predict”,. mean predcar, over(familysize),-,Over | Mean,-+-,predcar |,1 | .2714656,2 | .4240544,3 | .6051399,4 | .6232910,5 | .8719671,6 | .8097709,Probability of using car increases with family size,Note: Values bounce around because other vars are not set to common value.,Note 2: Again, scatter plots aid in summarizing such results,29,Stata Notes: mlogit,Like logit, you cant include variables that perfectly predict the outcome,Note: Stata “logit” command gives a warning of this,mlogit command,doesnt,give a warning, but coefficient will have z-value of zero, p-value =1,Remove problematic variables if this occurs!,30,Hypothesis Tests,Individual coefficients can be tested as usual,Wald test/z-values provided for each variable,However, adding a new variable to model actually yields,more than one,coefficient,If you have 4 categories, youll get 3 coefficients,LR tests are especially useful because you can test for improved fit across the whole model,31,LR Tests in Multinomial Logit,Example: Does “familysize” improve model?,Recall: It wasnt always significant maybe not!,Run full model, save results,mlogit mode income familysize,estimates store fullmodel,Run restricted model, save results,mlogit mode income,estimates store smallmodel,Compare: lrtest fullmodel smallmodel,Likelihood-ratio test LR chi2(2) = 9.55,(Assumption: smallmodel nested in fullmodel) Prob chi2 = 0.0084,Yes, model fit is significantly improved,32,Multinomial Logit Assumptions: IIA,Multinomial logit is designed for outcomes that are,not complexly interrelated,Critical assumption: Independence of Irrelevant Alternatives (IIA),Odds of one outcome versus another should be,independent,of other alternatives,Problems often come up when dealing with individual cho
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 大学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!