** I. BASICS ** Open "neighborhood" data *1 xtsum schid, i(schid) *2 ** Sort by level-2 unit identifier sort schid *3 browse schid attain *4 tab schid *5 *Download some Stata programs net from http://home.gwu.edu/~bartels/stata *6 *Generating clustercount and clustersize variables the long way by schid: gen clustersize=_N by schid: gen clustercount=_n browse schid attain clustersize clustercount tab clustersize if clustercount==1 *7 *Generating clustercount ("ccount") and clustersize ("csize") variables the short way (using the "ccount" program) ccount schid *8 *Generating a new schoolid variable ranging from 1 to 17 clusterid schid browse schid id_new attain ccount csize tab id_new **************************************************************************** **II. VARIANCE COMPONENTS MODEL - GENERATING NO POOLING CLUSTER MEANS AND PARTIALLY-POOLED CLUSTER MEANS *9 xtreg attain, i(schid) *NOTE: *_cons = pooled mean * theta = sigma_e ^2 * psi = sigma_u^2 *rho=intraclass correlation coefficient *10 * Generate unpooled cluster means of attain (DV) by schid: egen cmean=mean(attain) browse schid attain cmean *11 *Calculate shrinkage factor (R) gen R=(.32755315^2) / [(.32755315^2) + ((.96645848^2)/csize)] su R *12 *Generate ML resids *Generate yhat predict yhat, xb gen res=attain - yhat by schid: egen mlres=mean(res) *13 *Generate empirical Bayes residuals gen ebres=R*mlres *14 * Generate eb resids using Stata canned routine predict ebres_canned, u clist id_new ebres ebres_canned if ccount==1, noobs *15 *Generate and interpret pooling factor (omega) gen omega=1-R * We could also use the full forumula for this su omega if ccount==1 tab omega if ccount==1 scatter omega csize if ccount==1 *16 *Generate partially-pooled means, using eq. 1 gen ppoolmean1=omega*.0811608 + (1-omega)*cmean *17 * Generate partially-pooled means, using eq. 2 gen ppoolmean2=.0811608 + ebres clist id_new ppoolmean1 ppoolmean2 if ccount==1, noobs *18 * Generating canned partially pooled cluster mean using Stata predict command predict ppoolmean_canned, xbu clist id_new ppoolmean1 ppoolmean2 ppoolmean_canned if ccount==1, nooobs *19 *Graphs of means (completely pooled mean, partially pooled cluster means, and no pooling cluster means) *Graph with csize label scatter ppoolmean1 cmean id_new if ccount==1, msymbol(O S) yline(.0811608) mlabel(csize) *Graph with pooling factor label scatter ppoolmean1 cmean id_new if ccount==1, msymbol(O S) yline(.0811608) mlabel(omega) **Example 2: Use "hsb" data** *20 Basics xtsum schoolid, i(schoolid) ccount schoolid clusterid schoolid su csize if ccount==1 tab csize if ccount==1 *21 xtreg mathach, i(schoolid) *22 *Generate unpooled cluster means of mathach (DV) by schoolid: egen cmean=mean(mathach) *23 *Generate partially-pooled cluster means of mathach predict ppoolmean, xbu *24 *Generate and interpret pooling factor (omega) gen omega = (6.2563275^2) / [(6.2563275^2) + csize*(2.9608177^2)] * We could also use the full forumula for this su omega if ccount==1 tab omega if ccount==1 scatter omega csize if ccount==1 *25 *Graph scatter cmean ppoolmean id_new if ccount==1 & id_new<25, yline(12.63674) mlabel(omega) **************************************************************************** **III. MODELING MULTILEVEL DATA - FOUR MODELING APPROACHES ** Open the "hsb" data *26 *Describing and summarizing the data (distinguishing level-1 and level-2 variables) xtsum, i(schoolid) ccount schoolid clusterid schoolid *27 *OLS (complete pooling) reg mathach female ses minority sector disclim *28 *"Between" model the "long" way *Generate cluster means of the DV and all level-1 IVs sort schoolid by schoolid: egen mathach_m=mean(mathach) by schoolid: egen female_m=mean(female) by schoolid: egen ses_m=mean(ses) by schoolid: egen minority_m=mean(minority) reg mathach_m female_m ses_m minority_m sector disclim if ccount==1 *29 *"Between" model the "short" way xtreg mathach female ses minority sector disclim, i(schoolid) be *30 *Fixed effects (within, or "no pooling" approach) model the "long" way *Generate school dummies quietly tab schoolid, gen(sid) *Estimate OLS model with J-1 school dummies; exclude level-2 variables, which cannot be estimated using the FE approach reg mathach sid2-sid160 female ses minority *31 *Fixed effects model the "short" way xtreg mathach female ses minority sector disclim, i(schoolid) fe *32 *Random intercept (partial pooling) model; note: the "re" option at the end indicates "random effects" or random intercept model. *"re" is the default for xtreg, so you do not actually need to include that at the end. xtreg mathach female ses minority sector disclim, i(schoolid) re *Note: GLS is the default. Can also estimate this using MLE xtreg mathach female ses minority sector disclim, i(schoolid) re mle *33 *Interpretation of random intercept model xtreg mathach female ses minority sector disclim, i(schoolid) re *Intraclass correlation coefficient (rho) *Testing RI v. OLS using Lagrangian multiplier test xttest0 *Using MLE, you get a likelihood ratio test, which is analogous to the Lagrangian multiplier test *34 *Pooling factor (omega) gen omega2 = (5.9899567^2) / [(5.9899567^2) + csize*(1.3603734^2)] su omega2 if ccount==1 *35 *Alternative way of summarizing the degree of pooling *Generate empirical Bayes (partially-pooled) level-2 residuals predict ebres, u su ebres if ccount==1, det *Retrieve variance; compare it to model level-2 residual variance; plug into this formula display 1 - ( 1.380083 / (1.3603734^2)) *36 *Measures of R-squared at each level *Estimate model with no IVs xtreg mathach, i(schoolid) *Note: How does the degree of partial pooling change after conditioning on independent variables? *level-1 R-squared di [(6.2563275^2) - (5.9899567^2)] / (6.2563275^2) *level-2 R-squared di [(2.9608177^2) - (1.3603734^2)] / (2.9608177^2) ***************************************************************************** * IV. DEALING WITH CLUSTER CONFOUNDING *37 *Hausman test *Estimate FE model xtreg mathach female ses minority sector disclim, i(schoolid) fe est store fixed *Estimate RI model xtreg mathach female ses minority sector disclim, i(schoolid) est store random hausman fixed random *Accounting for cluster confounding *38 *Generate between-cluster operationalizations of level-1 variables the "long" way sort schoolid by schoolid: egen female_cmean=mean(female) by schoolid: egen ses_cmean=mean(ses) by schoolid: egen minority_cmean=mean(minority) *Generate within-cluster operationalizations of level-1 variables the "long" way gen female_within=female - female_cmean gen ses_within=ses - ses_cmean gen minority_within=minority - minority_cmean *39 **Generate within- and between-cluster operationalizations of level-1 variables the "short" way net from http://home.gwu.edu/~bartels/stata *download "clustergen" *"clustergen" assumes that the cluster id variable is called "id"; create new variable, "id," which equals "schoolid" gen id=schoolid clustergen female ses minority *40 *xtsum the new variables xtsum female_bw ses_bw minority_bw female_wi ses_wi minority_wi, i(schoolid) *41 *How Stata generates within and between s.d. in the "xtsum" command xtsum female ses minority su female_wi ses_wi minority_wi su female_bw ses_bw minority_bw if ccount==1 *42 *Proving that the within and between operationalizations are completely uncorrelated corr female_bw female_wi corr ses_bw ses_wi corr minority_bw minority_wi *43 *Estimating within- and between-cluster effects in the random intercept modeling framework * METHOD 1 xtreg mathach female_wi ses_wi minority_wi female_bw ses_bw minority_bw sector disclim, i(schoolid) mle *44 * METHOD 2 xtreg mathach female ses minority female_bw ses_bw minority_bw sector disclim, i(schoolid) mle ***************************************************************************** * V. LONGITUDINAL (TIME-SERIES CROSS-SECTIONAL) DATA *Use "Poe et al." data *45 *Describing the data su id year sort id year browse id year ccount id browse id year csize ccount su csize if ccount==1 tab csize if ccount==1 *xtset the data xtset id year *Which are time-varying and which are time-constant? xtsum ainew ailag polrt lpop mil2 brit *45.5 *Generating a lagged dependent variable *Note: there's already a lagged DV in the dataset called "ailag" by id: gen ainew_lag=ainew[_n-1] *46 *Completely pooled approach (OLS) with regular errors reg ainew ailag polrt lpop mil2 brit *47 *Completely pooled approach (OLS) with panel-corrected standard errors (PCSEs) xtpcse ainew ailag polrt lpop mil2 brit, pairwise *48 *Fixed effects with "regular" standard errors xtreg ainew ailag polrt lpop mil2 brit, i(id) fe *49 *Fixed effects with PCSEs quietly tab id, gen(id) *include J-1 dummies; exclude brit xtpcse ainew id2-id164 ailag polrt lpop mil2, pairwise *50 *Between model xtreg ainew ailag polrt lpop mil2 brit, i(id) be *51 * Estimating RE model xtreg ainew ailag polrt lpop mil2 brit, i(id) mle *52 *Generating within- and between-cluster operationalizations of time-varying variables clustergen ailag polrt lpop mil2 *53 *Estimating within- and between-cluster effects of time-varying variables *Method 1 xtreg ainew ailag_wi polrt_wi lpop_wi mil2_wi polrt_bw lpop_bw mil2_bw brit, i(id) mle *54 *Method 2 xtreg ainew ailag_wi polrt lpop mil2 polrt_bw lpop_bw mil2_bw brit, i(id) mle ***************************************************************************** * VI: RANDOM COEFFICIENT MODELS *Use "hsb" data ccount schoolid clusterid schoolid ** introducing the "xtmixed" command *55 *Random intercept model using "xtmixed" xtmixed mathach female ses minority sector disclim || schoolid:, mle *Report variance of error components instead of standard deviation xtmixed mathach female ses minority sector disclim || schoolid:, mle var *56 *Random coefficient model; specifying random coefficient for ses only xtmixed mathach female ses minority sector disclim || schoolid: ses, cov(unstruct) mle var *57 *RC v. RI model; is there statistically significant heterogeneity in the impact of ses across schools? (long way) *Long way; generate chi-square statistic, which is 2 times the difference of the log likelihoods di 2*(23160.801-23159.029) di chiprob(2, 3.544) *58 *RC v. RI model; is there statistically significant heterogeneity in the impact of ses across schools? (short way) xtmixed mathach female ses minority sector disclim || schoolid:, mle var est store ri xtmixed mathach female ses minority sector disclim || schoolid: ses, cov(unstruct) mle var est store rc lrtest rc ri *59 *Summarizing the degree of partial pooling *First, generate partially-pooled, empirical Bayes level-2 residuals for both random coeff and random intercept predict ebses ebint, reffects su ebses ebint if ccount==1, det *Retrieve variances *random intercept di 1 - (1.515212/2.128978) *random coefficient di 1 - (.060195/.2882566) *60 *RC Model with random coefficients specified for female, ses, and minority (all level-1 variables) xtmixed mathach female ses minority sector disclim || schoolid: female ses minority, cov(unstruct) mle var est store allvarsRC *61 *Testing whether there's significant heterogeneity in the impact of female across schools xtmixed mathach female ses minority sector disclim || schoolid: ses minority, cov(unstruct) mle var est store nofemaleRC lrtest allvarsRC nofemaleRC *62 *Testing whether there's significant heterogeneity in the impact of ses xtmixed mathach female ses minority sector disclim || schoolid: female minority, cov(unstruct) mle var est store nosesRC lrtest allvarsRC nosesRC *63 *Testing whether there's significant heterogeneity in the impact of ses xtmixed mathach female ses minority sector disclim || schoolid: female ses, cov(unstruct) mle var est store nominorityRC lrtest allvarsRC nominorityRC *64 *95% Confidence interval for random coefficients *Restore full RC results est restore allvarsRC *and this time, we want standard deviations of error components instead of variances *Female di -1.333062 + 1.96*.9474834 di -1.333062 - 1.96*.9474834 *ses di 2.063882 + 1.96*.4527255 di 2.063882 - 1.96*.4527255 *minority di -3.10186 + 1.96*1.176712 di -3.10186 - 1.96*1.176712 *65 *Empirical Bayes level-2 residuals for random coefficients and random intercept predict eb_fem eb_ses eb_min eb_int, reffects *Note the order; same order as variables are entered into model; intercept is always last *could also summarize the degree of partial pooling *66 *Generating partially-pooled predicted slopes for each school gen b_fem=-1.333062 + eb_fem gen b_ses=2.063882 + eb_ses gen b_min=-3.10186 + eb_min *67 *Graphing scatter b_fem id_new if ccount==1, mlabel(id_new) yline(-1.291226) scatter b_ses id_new if ccount==1, mlabel(id_new) yline(2.068435) scatter b_min id_new if ccount==1, mlabel(id_new) yline(-2.987567) *68 *Data exploration for causal heterogeneity; which level-2 variables are correlated with these cluster-specific effects? *Level-2 varaibles xtsum size- himinty, i(schoolid) corr b_fem size- himinty if ccount==1 corr b_ses size- himinty if ccount==1 corr b_min size- himinty if ccount==1 *Scatterplot scatter scatter b_min disclim if ccount==1 *Difference of means test for binary IVs ttest b_ses if ccount==1, by(sector) *69 *Generating cross-level interactions; how the impact of minority across schools varies as a function of disclim and sector gen minorityXdisclim=minority*disclim gen minorityXsector=minority*sector *70 *Estimate RC model with cross-level interactions xtmixed mathach female ses minority disclim sector minorityXdisclim minorityXsector || schoolid: female ses minority, cov(unstruct) mle var *71 *Graphical interpretations; graphing the impact of minority for low and high disciplinary climates *Need to generate predicted values of mathach as a function of minority (0 or 1) for high and low values of disclim, while holding the remaining variables at their mean values *First, create backup variables gen female_bkp=female gen ses_bkp=ses gen minority_bkp=minority gen disclim_bkp=disclim gen sector_bkp=sector gen minorityXdisclim_bkp=minorityXdisclim gen minorityXsector_bkp=minorityXsector su female ses su sector if ccount==1 replace female=.5281837 replace ses=.0001434 replace sector=.4375 replace minorityXsector=minority*.4375 su disclim if ccount==1, det *Use 10th percentile as "low" value and 90th pctile for "high" value *Low disclim (-1.2055) replace disclim=-1.2055 replace minorityXdisclim=minority*-1.2055 predict yhat_minority_lowdisc, xb label var yhat_minority_lowdisc "Low Disciplinary Climate" *High disclim (1.351) replace disclim=1.351 replace minorityXdisclim=minority*1.351 predict yhat_minority_highdisc, xb label var yhat_minority_lowdisc "High Disciplinary Climate" *Graph line yhat_minority_lowdisc yhat_minority_highdisc minority, sort xlab(0 1) ytitle(Predicted Math Achievement) xtitle("1=Minority, 0=otherwise") *Replace variables back to their original versions replace female=female_bkp replace ses=ses_bkp replace minority=minority_bkp replace disclim=disclim_bkp replace sector=sector_bkp replace minorityXdisclim=minorityXdisclim_bkp replace minorityXsector=minorityXsector_bkp *72 *RC Model incorporating cluster confounding issue clustergen ses *First, no cross-level interactions xtmixed mathach ses_wi ses_bw sector || schoolid: ses_wi, cov(unstruc) mle var est store rc *Estimate RI model to test for significant causal heterogeneity in the impact of ses_wi xtmixed mathach ses_wi ses_bw sector || schoolid:, mle var est store ri lrtest rc ri *73 *RC Model incorporating cluster confounding issue and with cross-level interactions gen ses_wiXsector=ses_wi*sector gen ses_wiXses_bw=ses_wi*ses_bw xtmixed mathach ses_wi ses_bw sector ses_wiXsector ses_wiXses_bw || schoolid: ses_wi, cov(unstruc) mle var *Graphs highlighting substantive interpretation of cross-level interactions *74 *Backup variables gen ses_bw_bkp=ses_bw gen sector_bkp=sector gen ses_wiXsector_bkp=ses_wiXsector gen ses_wiXses_bw_bkp=ses_wiXses_bw *75 *Graph of predicted mathach as a function of ses_wi for varying values of sector (1=Cath, 0=public), while holding ses_bw at its mean su ses_bw if ccount==1 replace ses_bw=-.0061888 replace ses_wiXses_bw=ses_wi*-.0061888 *Catholic Schools replace sector=1 replace ses_wiXsector=ses_wi*1 predict yhat_seswi_cath label var yhat_seswi_cath "Catholic" *Public Schools replace sector=0 replace ses_wiXsector=ses_wi*0 predict yhat_seswi_pub label var yhat_seswi_pub "Public" *Replace back to original versions replace ses_bw=ses_bw_bkp replace sector=sector_bkp replace ses_wiXsector=ses_wiXsector_bkp replace ses_wiXses_bw=ses_wiXses_bw_bkp line yhat_seswi_cath yhat_seswi_pub ses_wi, sort ytitle(Predicted Math Achievement) xtitle(Individual-Level SES) lcolor(black black) lpattern(solid dash) *76 *Graph of predicted mathach as a function of ses_wi for high and low values of ses_bw, while holding sector at its mean su sector if ccount==1 replace sector=.4375 replace ses_wiXsector=ses_wi*.4375 *Retrieve high (90th pctile) and low (10th pctile) of ses_bw su ses_bw if ccount==1, det *Low value of ses_bw (-.5898246) replace ses_bw= -.5898246 replace ses_wiXses_bw=ses_wi*-.5898246 predict yhat_seswi_lowsesbw label var yhat_seswi_lowsesbw "Low School-Level SES" *High value of ses_bw (.5225781) replace ses_bw= .5225781 replace ses_wiXses_bw=ses_wi*.5225781 predict yhat_seswi_highsesbw label var yhat_seswi_highsesbw "High School-Level SES" *Replace back to original versions replace ses_bw=ses_bw_bkp replace sector=sector_bkp replace ses_wiXsector=ses_wiXsector_bkp replace ses_wiXses_bw=ses_wiXses_bw_bkp line yhat_seswi_highsesbw yhat_seswi_lowsesbw ses_wi, sort ytitle(Predicted Math Achievement) xtitle(Individual-Level SES) lcolor(black black) lpattern(solid dash)