** I.  BASICS
** Open "neighborhood" data

*1
xtsum schid, i(schid)

*2
** Sort by level-2 unit identifier
sort schid

*3
browse schid attain

*4
tab schid

*5
*Download some Stata programs
net from http://home.gwu.edu/~bartels/stata

*6
*Generating clustercount and clustersize variables the long way
by schid: gen clustersize=_N
by schid: gen clustercount=_n
browse schid attain clustersize clustercount
tab clustersize if clustercount==1

*7
*Generating clustercount ("ccount") and clustersize ("csize") variables the short way (using the "ccount" program)
ccount schid

*8
*Generating a new schoolid variable ranging from 1 to 17
clusterid schid
browse schid id_new attain ccount csize
tab id_new

****************************************************************************

**II.  VARIANCE COMPONENTS MODEL - GENERATING NO POOLING CLUSTER MEANS AND PARTIALLY-POOLED CLUSTER MEANS

*9
xtreg attain, i(schid)
*NOTE:
*_cons = pooled mean
* theta = sigma_e ^2
* psi = sigma_u^2
*rho=intraclass correlation coefficient

*10
* Generate unpooled cluster means of attain (DV)
by schid: egen cmean=mean(attain)
browse schid attain cmean

*11
*Calculate shrinkage factor (R)
gen R=(.32755315^2) / [(.32755315^2) + ((.96645848^2)/csize)]
su R

*12
*Generate ML resids
*Generate yhat
predict yhat, xb
gen res=attain - yhat
by schid: egen mlres=mean(res)

*13
*Generate empirical Bayes residuals
gen ebres=R*mlres

*14
* Generate eb resids using Stata canned routine
predict ebres_canned, u
clist id_new ebres ebres_canned if ccount==1, noobs

*15
*Generate and interpret pooling factor (omega)
gen omega=1-R
* We could also use the full forumula for this
su omega if ccount==1
tab omega if ccount==1
scatter omega csize if ccount==1

*16
*Generate partially-pooled means, using eq. 1
gen ppoolmean1=omega*.0811608 + (1-omega)*cmean

*17
* Generate partially-pooled means, using eq. 2
gen ppoolmean2=.0811608 +  ebres
clist id_new ppoolmean1 ppoolmean2 if ccount==1, noobs

*18
* Generating canned partially pooled cluster mean using Stata predict command
predict ppoolmean_canned, xbu
clist id_new ppoolmean1 ppoolmean2 ppoolmean_canned if ccount==1, nooobs

*19
*Graphs of means (completely pooled mean, partially pooled cluster means, and no pooling cluster means)
*Graph with csize label
scatter ppoolmean1 cmean id_new if ccount==1, msymbol(O S) yline(.0811608) mlabel(csize)
*Graph with pooling factor label
scatter ppoolmean1 cmean id_new if ccount==1, msymbol(O S) yline(.0811608) mlabel(omega)

**Example 2: Use "hsb" data**

*20 Basics
xtsum schoolid, i(schoolid)
ccount schoolid
clusterid schoolid
su csize if ccount==1
tab csize if ccount==1

*21
xtreg mathach, i(schoolid)

*22
*Generate unpooled cluster means of mathach (DV)
by schoolid: egen cmean=mean(mathach)

*23
*Generate partially-pooled cluster means of mathach
predict ppoolmean, xbu

*24
*Generate and interpret pooling factor (omega)
gen omega = (6.2563275^2) / [(6.2563275^2) + csize*(2.9608177^2)]
* We could also use the full forumula for this
su omega if ccount==1
tab omega if ccount==1
scatter omega csize if ccount==1

*25
*Graph
scatter cmean ppoolmean id_new if ccount==1 & id_new<25, yline(12.63674) mlabel(omega)

****************************************************************************

**III.  MODELING MULTILEVEL DATA - FOUR MODELING APPROACHES

** Open the "hsb" data

*26
*Describing and summarizing the data (distinguishing level-1 and level-2 variables)
xtsum, i(schoolid)
ccount schoolid
clusterid schoolid

*27
*OLS (complete pooling)
reg mathach female ses minority sector disclim

*28
*"Between" model the "long" way
*Generate cluster means of the DV and all level-1 IVs
sort schoolid
by schoolid: egen mathach_m=mean(mathach)
by schoolid: egen female_m=mean(female)
by schoolid: egen ses_m=mean(ses)
by schoolid: egen minority_m=mean(minority)
reg mathach_m female_m ses_m minority_m sector disclim if ccount==1

*29
*"Between" model the "short" way
xtreg mathach female ses minority sector disclim, i(schoolid) be

*30
*Fixed effects (within, or "no pooling" approach) model the "long" way
*Generate school dummies
quietly tab schoolid, gen(sid)
*Estimate OLS model with J-1 school dummies; exclude level-2 variables, which cannot be estimated using the FE approach
reg mathach sid2-sid160 female ses minority

*31
*Fixed effects model the "short" way
xtreg mathach female ses minority sector disclim, i(schoolid) fe

*32
*Random intercept (partial pooling) model; note: the "re" option at the end indicates "random effects" or random intercept model.
*"re" is the default for xtreg, so you do not actually need to include that at the end.
xtreg mathach female ses minority sector disclim, i(schoolid) re
*Note: GLS is the default. Can also estimate this using MLE
xtreg mathach female ses minority sector disclim, i(schoolid) re mle

*33
*Interpretation of random intercept model
xtreg mathach female ses minority sector disclim, i(schoolid) re
*Intraclass correlation coefficient (rho)
*Testing RI v. OLS using Lagrangian multiplier test
xttest0
*Using MLE, you get a likelihood ratio test, which is analogous to the Lagrangian multiplier test

*34
*Pooling factor (omega)
gen omega2 = (5.9899567^2) / [(5.9899567^2) + csize*(1.3603734^2)]
su omega2 if ccount==1

*35
*Alternative way of summarizing the degree of pooling
*Generate empirical Bayes (partially-pooled) level-2 residuals
predict ebres, u
su ebres if ccount==1, det
*Retrieve variance; compare it to model level-2 residual variance; plug into this formula
display 1 - ( 1.380083 / (1.3603734^2))

*36
*Measures of R-squared at each level
*Estimate model with no IVs
xtreg mathach, i(schoolid)
*Note: How does the degree of partial pooling change after conditioning on independent variables?
*level-1 R-squared
di [(6.2563275^2) - (5.9899567^2)] / (6.2563275^2)
*level-2 R-squared
di [(2.9608177^2) - (1.3603734^2)] / (2.9608177^2)

*****************************************************************************

* IV.  DEALING WITH CLUSTER CONFOUNDING

*37
*Hausman test
*Estimate FE model
xtreg mathach female ses minority sector disclim, i(schoolid) fe
est store fixed
*Estimate RI model
xtreg mathach female ses minority sector disclim, i(schoolid)
est store random
hausman fixed random

*Accounting for cluster confounding

*38
*Generate between-cluster operationalizations of level-1 variables the "long" way
sort schoolid
by schoolid: egen female_cmean=mean(female)
by schoolid: egen ses_cmean=mean(ses)
by schoolid: egen minority_cmean=mean(minority)
*Generate within-cluster operationalizations of level-1 variables the "long" way
gen female_within=female - female_cmean
gen ses_within=ses - ses_cmean
gen minority_within=minority - minority_cmean

*39
**Generate within- and between-cluster operationalizations of level-1 variables the "short" way
net from http://home.gwu.edu/~bartels/stata
*download "clustergen"
*"clustergen" assumes that the cluster id variable is called "id"; create new variable, "id," which equals "schoolid"
gen id=schoolid
clustergen female ses minority

*40
*xtsum the new variables
xtsum female_bw ses_bw minority_bw female_wi ses_wi minority_wi, i(schoolid)

*41
*How Stata generates within and between s.d. in the "xtsum" command
xtsum female ses minority
su female_wi ses_wi minority_wi
su female_bw ses_bw minority_bw if ccount==1

*42
*Proving that the within and between operationalizations are completely uncorrelated
corr female_bw female_wi
corr ses_bw ses_wi
corr minority_bw minority_wi

*43
*Estimating within- and between-cluster effects in the random intercept modeling framework
* METHOD 1
xtreg mathach female_wi ses_wi minority_wi female_bw ses_bw minority_bw sector disclim, i(schoolid) mle

*44
* METHOD 2
xtreg mathach female ses minority female_bw ses_bw minority_bw sector disclim, i(schoolid) mle

*****************************************************************************

* V.  LONGITUDINAL (TIME-SERIES CROSS-SECTIONAL) DATA

*Use "Poe et al." data

*45
*Describing the data
su id year
sort id year
browse id year
ccount id
browse id year csize ccount
su csize if ccount==1
tab csize if ccount==1
*xtset the data
xtset id year
*Which are time-varying and which are time-constant?
xtsum ainew ailag polrt lpop mil2 brit

*45.5
*Generating a lagged dependent variable
*Note: there's already a lagged DV in the dataset called "ailag"
by id: gen ainew_lag=ainew[_n-1]

*46
*Completely pooled approach (OLS) with regular errors 
reg ainew ailag polrt lpop mil2 brit

*47
*Completely pooled approach (OLS) with panel-corrected standard errors (PCSEs)
xtpcse ainew ailag polrt lpop mil2 brit, pairwise

*48
*Fixed effects with "regular" standard errors
xtreg ainew ailag polrt lpop mil2 brit, i(id) fe

*49
*Fixed effects with PCSEs
quietly tab id, gen(id)
*include J-1 dummies; exclude brit
xtpcse ainew id2-id164 ailag polrt lpop mil2, pairwise

*50
*Between model
xtreg ainew ailag polrt lpop mil2 brit, i(id) be

*51
* Estimating RE model
xtreg ainew ailag polrt lpop mil2 brit, i(id) mle

*52
*Generating within- and between-cluster operationalizations of time-varying variables
clustergen ailag polrt lpop mil2

*53
*Estimating within- and between-cluster effects of time-varying variables
*Method 1
xtreg ainew ailag_wi polrt_wi lpop_wi mil2_wi polrt_bw lpop_bw mil2_bw brit, i(id) mle

*54
*Method 2
xtreg ainew ailag_wi polrt lpop mil2 polrt_bw lpop_bw mil2_bw brit, i(id) mle

*****************************************************************************

* VI:  RANDOM COEFFICIENT MODELS

*Use "hsb" data
ccount schoolid
clusterid schoolid

** introducing the "xtmixed" command

*55
*Random intercept model using "xtmixed"
xtmixed mathach female ses minority sector disclim || schoolid:, mle
*Report variance of error components instead of standard deviation
xtmixed mathach female ses minority sector disclim || schoolid:, mle var

*56
*Random coefficient model; specifying random coefficient for ses only
xtmixed mathach female ses minority sector disclim || schoolid: ses, cov(unstruct) mle var

*57
*RC v. RI model; is there statistically significant heterogeneity in the impact of ses across schools? (long way)
*Long way; generate chi-square statistic, which is 2 times the difference of the log likelihoods
di 2*(23160.801-23159.029)
di chiprob(2, 3.544)

*58
*RC v. RI model; is there statistically significant heterogeneity in the impact of ses across schools? (short way)
xtmixed mathach female ses minority sector disclim || schoolid:, mle var
est store ri
xtmixed mathach female ses minority sector disclim || schoolid: ses, cov(unstruct) mle var
est store rc
lrtest rc ri

*59
*Summarizing the degree of partial pooling
*First, generate partially-pooled, empirical Bayes level-2 residuals for both random coeff and random intercept
predict ebses ebint, reffects
su ebses ebint if ccount==1, det
*Retrieve variances
*random intercept
di 1 - (1.515212/2.128978)
*random coefficient
di 1 - (.060195/.2882566)

*60
*RC Model with random coefficients specified for female, ses, and minority (all level-1 variables)
xtmixed mathach female ses minority sector disclim || schoolid: female ses minority, cov(unstruct) mle var
est store allvarsRC

*61
*Testing whether there's significant heterogeneity in the impact of female across schools
xtmixed mathach female ses minority sector disclim || schoolid: ses minority, cov(unstruct) mle var
est store nofemaleRC
lrtest allvarsRC nofemaleRC

*62
*Testing whether there's significant heterogeneity in the impact of ses
xtmixed mathach female ses minority sector disclim || schoolid: female minority, cov(unstruct) mle var
est store nosesRC
lrtest allvarsRC nosesRC

*63
*Testing whether there's significant heterogeneity in the impact of ses
xtmixed mathach female ses minority sector disclim || schoolid: female ses, cov(unstruct) mle var
est store nominorityRC
lrtest allvarsRC nominorityRC

*64
*95% Confidence interval for random coefficients
*Restore full RC results
est restore allvarsRC
*and this time, we want standard deviations of error components instead of variances
*Female
di -1.333062 + 1.96*.9474834
di -1.333062 - 1.96*.9474834
*ses
di 2.063882 + 1.96*.4527255
di 2.063882 - 1.96*.4527255
*minority
di -3.10186 + 1.96*1.176712
di -3.10186 - 1.96*1.176712

*65
*Empirical Bayes level-2 residuals for random coefficients and random intercept
predict eb_fem eb_ses eb_min eb_int, reffects
*Note the order; same order as variables are entered into model; intercept is always last
*could also summarize the degree of partial pooling

*66
*Generating partially-pooled predicted slopes for each school
gen b_fem=-1.333062 + eb_fem
gen b_ses=2.063882 + eb_ses
gen b_min=-3.10186 + eb_min

*67
*Graphing
scatter b_fem id_new if ccount==1, mlabel(id_new) yline(-1.291226)
scatter b_ses id_new if ccount==1, mlabel(id_new) yline(2.068435)
scatter b_min id_new if ccount==1, mlabel(id_new) yline(-2.987567)

*68
*Data exploration for causal heterogeneity; which level-2 variables are correlated with these cluster-specific effects?
*Level-2 varaibles
xtsum size- himinty, i(schoolid)
corr b_fem size- himinty if ccount==1
corr b_ses size- himinty if ccount==1
corr b_min size- himinty if ccount==1
*Scatterplot
scatter scatter b_min disclim if ccount==1
*Difference of means test for binary IVs
ttest b_ses if ccount==1, by(sector)

*69
*Generating cross-level interactions; how the impact of minority across schools varies as a function of disclim and sector
gen minorityXdisclim=minority*disclim
gen minorityXsector=minority*sector

*70
*Estimate RC model with cross-level interactions
xtmixed mathach female ses minority disclim sector minorityXdisclim minorityXsector || schoolid: female ses minority, cov(unstruct) mle var

*71
*Graphical interpretations; graphing the impact of minority for low and high disciplinary climates
*Need to generate predicted values of mathach as a function of minority (0 or 1) for high and low values of disclim, while holding the remaining variables at their mean values
*First, create backup variables
gen female_bkp=female
gen ses_bkp=ses
gen minority_bkp=minority 
gen disclim_bkp=disclim 
gen sector_bkp=sector
gen minorityXdisclim_bkp=minorityXdisclim
gen minorityXsector_bkp=minorityXsector
su female ses
su sector if ccount==1
replace female=.5281837
replace ses=.0001434
replace sector=.4375
replace minorityXsector=minority*.4375
su disclim if ccount==1, det
*Use 10th percentile as "low" value and 90th pctile for "high" value
*Low disclim (-1.2055)
replace disclim=-1.2055
replace minorityXdisclim=minority*-1.2055
predict yhat_minority_lowdisc, xb
label var yhat_minority_lowdisc "Low Disciplinary Climate"
*High disclim (1.351)
replace disclim=1.351
replace minorityXdisclim=minority*1.351
predict yhat_minority_highdisc, xb
label var yhat_minority_lowdisc "High Disciplinary Climate"
*Graph
line yhat_minority_lowdisc yhat_minority_highdisc minority, sort xlab(0 1) ytitle(Predicted Math Achievement) xtitle("1=Minority, 0=otherwise")
*Replace variables back to their original versions
replace female=female_bkp
replace ses=ses_bkp
replace minority=minority_bkp 
replace disclim=disclim_bkp 
replace sector=sector_bkp
replace minorityXdisclim=minorityXdisclim_bkp
replace minorityXsector=minorityXsector_bkp

*72
*RC Model incorporating cluster confounding issue
clustergen ses
*First, no cross-level interactions
xtmixed mathach ses_wi ses_bw sector || schoolid: ses_wi, cov(unstruc) mle var
est store rc
*Estimate RI model to test for significant causal heterogeneity in the impact of ses_wi
xtmixed mathach ses_wi ses_bw sector || schoolid:, mle var
est store ri
lrtest rc ri

*73
*RC Model incorporating cluster confounding issue and with cross-level interactions
gen ses_wiXsector=ses_wi*sector
gen ses_wiXses_bw=ses_wi*ses_bw
xtmixed mathach ses_wi ses_bw sector ses_wiXsector ses_wiXses_bw || schoolid: ses_wi, cov(unstruc) mle var

*Graphs highlighting substantive interpretation of cross-level interactions
*74
*Backup variables
gen ses_bw_bkp=ses_bw
gen sector_bkp=sector
gen ses_wiXsector_bkp=ses_wiXsector
gen ses_wiXses_bw_bkp=ses_wiXses_bw

*75
*Graph of predicted mathach as a function of ses_wi for varying values of sector (1=Cath, 0=public), while holding ses_bw at its mean
su ses_bw if ccount==1
replace ses_bw=-.0061888
replace ses_wiXses_bw=ses_wi*-.0061888
*Catholic Schools
replace sector=1
replace ses_wiXsector=ses_wi*1
predict yhat_seswi_cath
label var yhat_seswi_cath "Catholic"
*Public Schools
replace sector=0
replace ses_wiXsector=ses_wi*0
predict yhat_seswi_pub
label var yhat_seswi_pub "Public"
*Replace back to original versions
replace ses_bw=ses_bw_bkp
replace sector=sector_bkp
replace ses_wiXsector=ses_wiXsector_bkp
replace ses_wiXses_bw=ses_wiXses_bw_bkp
line yhat_seswi_cath yhat_seswi_pub ses_wi, sort ytitle(Predicted Math Achievement) xtitle(Individual-Level SES) lcolor(black black) lpattern(solid dash)

*76
*Graph of predicted mathach as a function of ses_wi for high and low values of ses_bw, while holding sector at its mean
su sector if ccount==1
replace sector=.4375
replace ses_wiXsector=ses_wi*.4375
*Retrieve high (90th pctile) and low (10th pctile) of ses_bw
su ses_bw if ccount==1, det
*Low value of ses_bw (-.5898246)
replace ses_bw= -.5898246
replace ses_wiXses_bw=ses_wi*-.5898246
predict yhat_seswi_lowsesbw
label var yhat_seswi_lowsesbw "Low School-Level SES"
*High value of ses_bw (.5225781)
replace ses_bw= .5225781
replace ses_wiXses_bw=ses_wi*.5225781
predict yhat_seswi_highsesbw
label var yhat_seswi_highsesbw "High School-Level SES"
*Replace back to original versions
replace ses_bw=ses_bw_bkp
replace sector=sector_bkp
replace ses_wiXsector=ses_wiXsector_bkp
replace ses_wiXses_bw=ses_wiXses_bw_bkp
line yhat_seswi_highsesbw yhat_seswi_lowsesbw ses_wi, sort ytitle(Predicted Math Achievement) xtitle(Individual-Level SES) lcolor(black black) lpattern(solid dash)