- Last lecture: SEM II: Mediation.
- Today: Multilevel: part I.
2018-12-04 | disclaimer
Multilevel models
Not: Make you a ‘multilevel’ expert. That would take an entire separate course! But you should be able to apply the basics and understand when you come across it.
After today you should be able to complete the following sections for Assignment II:
Multilevel model. (Some bits still next week. Such as plotting it.)
MLwin, SPSS (to a degree - cannot do some more complex models (easily)), SAS, Stata, HLM.
R – many options.
‘Group’ data are everywhere… .
Independence although commonly assumed, often is not present. (Kruskal, 1988)
Children in classes in schools.
Employees in depts. in companies (in regions)
Individuals nested in neighbourhoods. Neighborhoods in districts/provinces. Countries nested in regions.
Individuals in couples/families.
Patients assigned to the same therapist - therapists nested in hospitals.
Stimuli in sets. Block designs.
Multilevel model
Random effects model
Mixed model
Random coefficient model
Hierarchical model
Very simply put: depending on the level of analysis you can get very different results.
Different levels of analysis.
Famous example. Robinson (1950) on literacy and migrants.
Graphically:
Good reasons from here.
Correct inferences: Traditional multiple regression techniques treat the units of analysis as independent observations. If there is a hierarchical structure: standard errors of regression coefficients will (likely) be underestimated, leading to an overstatement of statistical significance.
Substantive interest in group effects: In many situations a key research question concerns the extent of grouping in individual outcomes, and the identification of ‘outlying’ groups. For example, in evaluations of school performance we might be interested in the ‘value-added’ school effects on pupil attainment. To achieve this: school-level residuals in a multilevel model adjusting for prior attainment.
Estimating group effects simultaneously with the effects of group-level predictors: An alternative way to allow for group effects: include dummy variables for groups in a traditional (ordinary least squares) regression model. Such a model is called an analysis of variance or fixed effects model. In many cases there will be predictors defined at the group level, e.g. type of school (mixed vs. single sex). In a fixed effects model, the effects of group-level predictors are confounded with the effects of the group dummies: it is not possible to separate out effects due to observed and unobserved group characteristics. In a multilevel (random effects) model, the effects of both types of variable can be estimated.
Inference to a population of groups: In a multilevel model the groups in the sample are treated as a random sample from a population of groups. Using a fixed effects model, inferences cannot be made beyond the groups in the sample.
If you get bored: I actually wrote on this: monkeys / countries.
Read this.
They allow for more accurate estimation of effects (for example, you could predict what would happen for a new school or class or individual). However, easy to mistake for causal effects – i.e. the school is causing the effect of X,Y,Z.
You’ll need to be aware of these. We will not (always) check them here… . It’s regression again… .
Do you remember the assumptions? Potential ways of checking them?
More here
You can read about it here and here.
It depends which level you are interested in and whether you care about fixed vs. random effect. You can consider Bayesian models which have better properties for smaller samples (but come with their own challenges).
R packages.
Lme4 – linear models
nlme – non-linear models. but can also do linear models.
MCMCglmm – Bayesian models.
brms – Bayesian Regression Models via Stan.
Main packages - it’s up to you to pick… . Note that the codes differ between those packages!
The dependent variable is the standardized result of a student on a specific exam (“normexam”).
In estimating the score on the exam, two levels: student and school.
On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (London Reading test, “standLRT”). On the school-level, we take into account the average intake-score (“schavg”).
More info: Goldstein, H., Rasbash, J., et al. (1993). A multilevel analysis of school examination results. Oxford Review of Education, 19, 425-433. And here
setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture10") library(mlmRev) # contains data library(lme4) data<-mlmRev::Exam null_model<-lmer(normexam ~ 1 + (1 | school), data=Exam, REML=F)
sink("null_model.txt") summary(null_model)
## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: normexam ~ 1 + (1 | school) ## Data: Exam ## ## AIC BIC logLik deviance df.resid ## 11016.6 11035.6 -5505.3 11010.6 4056 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.9471 -0.6486 0.0117 0.6992 3.6578 ## ## Random effects: ## Groups Name Variance Std.Dev. ## school (Intercept) 0.1686 0.4107 ## Residual 0.8478 0.9207 ## Number of obs: 4059, groups: school, 65 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) -0.01317 0.05363 -0.246
sink()
We came across Maximum Likelihood (ML) before. Restricted Maximum Likelihood (REML) might be new. REML has better properties for estimating effects in small samples but does not allow for model comparison for fixed effects.
When comparing fixed effects we would use ML. We can quantify the evidence for each model. Remember that we can also rescale and assign AIC weights.
We can also average across models (have a look at the MuMin package, for example.)
Same model as before but now in nlme.
require(nlme) lme(fixed = normexam ~ 1, data = Exam,random = ~ 1 | school)
## Linear mixed-effects model fit by REML ## Data: Exam ## Log-restricted-likelihood: -5507.327 ## Fixed: normexam ~ 1 ## (Intercept) ## -0.01325213 ## ## Random effects: ## Formula: ~1 | school ## (Intercept) Residual ## StdDev: 0.4142457 0.9207376 ## ## Number of Observations: 4059 ## Number of Groups: 65
Note that the effect of ‘standLRT’ is not varying between groups. We are only looking at the individual level. We allow schools to vary based on ‘normexam’ but not as a function of ‘standLRT’. So no random slopes just random intercepts.
fixed_pred<-lmer(normexam ~ standLRT + (1 | school), data=Exam, REML=F) summary(fixed_pred)
## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: normexam ~ standLRT + (1 | school) ## Data: Exam ## ## AIC BIC logLik deviance df.resid ## 9365.2 9390.5 -4678.6 9357.2 4055 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.7162 -0.6304 0.0287 0.6844 3.2680 ## ## Random effects: ## Groups Name Variance Std.Dev. ## school (Intercept) 0.09213 0.3035 ## Residual 0.56573 0.7522 ## Number of obs: 4059, groups: school, 65 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 0.002391 0.040023 0.06 ## standLRT 0.563371 0.012465 45.20 ## ## Correlation of Fixed Effects: ## (Intr) ## standLRT 0.008
You can see t values but not p values.
Read more about this.
Inference best done in other ways… .
When comparing FIXED effects we rely on ML. (The \(\chi^2\) is known as the likelihood ratio test).
anova(null_model, fixed_pred)
## Data: Exam ## Models: ## null_model: normexam ~ 1 + (1 | school) ## fixed_pred: normexam ~ standLRT + (1 | school) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## null_model 3 11016.6 11035.6 -5505.3 11010.6 ## fixed_pred 4 9365.2 9390.5 -4678.6 9357.2 1653.4 1 < 2.2e-16 ## ## null_model ## fixed_pred *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A likelihood ratio test suggested that a model with a fixed effect for the London Reading test was a better fit, as compared to a null model (\(\chi^2\)= 1653.4, \(p\) <.0001).
With caution (!), we can use REML to compare RANDOM effect structures. Does it matter when you allow the effect of standLRT to vary between schools?
In the second model, a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, what happens if we allow the effect of the London Reading Test to vary between schools: in some schools the effect of that test could be larger and in others it could be smaller.
randomslope<- lmer(normexam ~ 1 + (standLRT | school), data=Exam) null_reml <- lmer(normexam ~ 1 + (1 | school), data=Exam) anova(null_reml,randomslope, refit=F) # Otherwise it uses ML.
## Data: Exam ## Models: ## null_reml: normexam ~ 1 + (1 | school) ## randomslope: normexam ~ 1 + (standLRT | school) ## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) ## null_reml 3 11021 11039.6 -5507.3 11015 ## randomslope 5 9490 9521.5 -4740.0 9480 1534.7 2 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(randomslope)
## Linear mixed model fit by REML ['lmerMod'] ## Formula: normexam ~ 1 + (standLRT | school) ## Data: Exam ## ## REML criterion at convergence: 9480 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8545 -0.6314 0.0424 0.6854 3.5188 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## school (Intercept) 0.2153 0.4640 ## standLRT 0.3180 0.5639 0.81 ## Residual 0.5531 0.7437 ## Number of obs: 4059, groups: school, 65 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) -0.36825 0.03769 -9.77
conditional AIC. This similarly suggests that a random slope model is a better fit. (Have a look at cAIC4, it can also do bootstraps if you would need it.)
require(cAIC4) cAIC(null_reml)$caic
## [1] 10909.74
cAIC(randomslope)$caic
## [1] 9233.549
plot(ranef(fixed_pred))
## $school
plot(ranef(randomslope))
## $school
This model has both a random slope at school level for ‘standLRT’ and a fixed effect at the individual level. This model allows the effect to vary as a function of school but still allows us to examine it at the individual level as well.
standLRT<-lmer(normexam ~ standLRT + (standLRT | school), data=Exam, REML=F) sink(file = 'standLRT.txt') summary(standLRT)
## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: normexam ~ standLRT + (standLRT | school) ## Data: Exam ## ## AIC BIC logLik deviance df.resid ## 9328.9 9366.7 -4658.4 9316.9 4053 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8313 -0.6325 0.0340 0.6832 3.4561 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## school (Intercept) 0.09045 0.3007 ## standLRT 0.01454 0.1206 0.50 ## Residual 0.55366 0.7441 ## Number of obs: 4059, groups: school, 65 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) -0.01150 0.03978 -0.289 ## standLRT 0.55673 0.01994 27.925 ## ## Correlation of Fixed Effects: ## (Intr) ## standLRT 0.365
sink()
It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lmer-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school)
schoolaverage<-lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam, REML=F) sink(file = 'schoolaverage.txt') summary(schoolaverage)
## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: normexam ~ standLRT + schavg + (1 + standLRT | school) ## Data: Exam ## ## AIC BIC logLik deviance df.resid ## 9324.4 9368.6 -4655.2 9310.4 4052 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8279 -0.6313 0.0337 0.6844 3.4370 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## school (Intercept) 0.07448 0.2729 ## standLRT 0.01489 0.1220 0.38 ## Residual 0.55362 0.7441 ## Number of obs: 4059, groups: school, 65 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) -0.00124 0.03668 -0.034 ## standLRT 0.55239 0.02018 27.376 ## schavg 0.29476 0.10559 2.792 ## ## Correlation of Fixed Effects: ## (Intr) stnLRT ## standLRT 0.268 ## schavg 0.089 -0.087
sink()
(Thomas needs to show code… ).
require(stargazer) stargazer(null_model,standLRT,schoolaverage, type="html", dep.var.labels = c("Scores"), covariate.labels=c("London reading test", "School average"), style= "demography", out= "multilevel models.html", header=F)
## ## <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3">Scores</td></tr> ## <tr><td style="text-align:left"></td><td>Model 1</td><td>Model 2</td><td>Model 3</td></tr> ## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">London reading test</td><td></td><td>0.557<sup>***</sup></td><td>0.552<sup>***</sup></td></tr> ## <tr><td style="text-align:left"></td><td></td><td>(0.020)</td><td>(0.020)</td></tr> ## <tr><td style="text-align:left">School average</td><td></td><td></td><td>0.295<sup>**</sup></td></tr> ## <tr><td style="text-align:left"></td><td></td><td></td><td>(0.106)</td></tr> ## <tr><td style="text-align:left">Constant</td><td>-0.013</td><td>-0.012</td><td>-0.001</td></tr> ## <tr><td style="text-align:left"></td><td>(0.054)</td><td>(0.040)</td><td>(0.037)</td></tr> ## <tr><td style="text-align:left"><em>N</em></td><td>4,059</td><td>4,059</td><td>4,059</td></tr> ## <tr><td style="text-align:left">Log Likelihood</td><td>-5,505.324</td><td>-4,658.435</td><td>-4,655.215</td></tr> ## <tr><td style="text-align:left">AIC</td><td>11,016.650</td><td>9,328.871</td><td>9,324.429</td></tr> ## <tr><td style="text-align:left">BIC</td><td>11,035.580</td><td>9,366.723</td><td>9,368.590</td></tr> ## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td colspan="4" style="text-align:left"><sup>*</sup>p < .05; <sup>**</sup>p < .01; <sup>***</sup>p < .001</td></tr> ## </table>
Load ‘ScotsSec’ form the mlmRev package.
Scores attained by 3435 Scottish secondary school students on a standardized test taken at age 16. Both the primary school and the secondary school that the student attended have been recorded.
Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education, 5, 97-121.
Test a null model (random intercept at ‘second’) and ‘attain’ as dependent.
Build a model with a fixed effect of ‘verbal’ and a random intercept (second) and ‘attain’ as dependent. Compare it to the null model. Export the results.
Build a model with a school level (second) a random intercept + slope (verbal) and ‘attain’ as dependent. Compare it to the null model.
Make a plot.
require(lmerTest) ranova(fixed_pred) # rand() is alternative command but leads to issues.
## ANOVA-like table for random-effects: Single term deletions ## ## Model: ## normexam ~ standLRT + (1 | school) ## npar logLik AIC LRT Df Pr(>Chisq) ## <none> 4 -4678.6 9365.2 ## (1 | school) 3 -4880.3 9766.5 403.27 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check this. You can also sj.Plot.
Are the residuals normal and linear? We are looking for straight lines… .
require(DHARMa) # 250 is the default. simulationOutput <- simulateResiduals(fittedModel = fixed_pred, n = 250) plotSimulatedResiduals(simulationOutput = simulationOutput)
We can also obtain a formal test.
testUniformity(simulationOutput = simulationOutput)
## ## One-sample Kolmogorov-Smirnov test ## ## data: simulationOutput$scaledResiduals ## D = 0.016165, p-value = 0.2394 ## alternative hypothesis: two-sided
Check our residuals. No real funnel. Homogeneity OK.
plot(fixed_pred)
Centering: sometimes you are interested in cross-level interactions. In such a case, it is sensible to grand mean center your variables first.
Statistical power. Depending on the effect you are looking for the sample sizes you’ll need will vary. But tends to be quite ‘data hungry’. You can simulate your power based on here.
Effect size measures. These are generally not easy to obtain but have a look at the intra-class correlation coefficient here or here
Not evaluated here. But often (as in your assignment) you will have nested data.
# This would nest classes in schools. # lme 4 (1|school/class) # nlme: Classes in schools. random = ~1 |school/class # Suppose that you have two memberships which are non-nested # lme4 (1 | school) + (1 | doctor) # Harder to do in nlme - find out on your own it involves lists :)
require(lmerTest) plot(ranef(null_model))
## $school
plot(ranef(randomslope))
## $school
require(lmeresampler) require(boot) require(lme4) require(mlmRev) Exam<-mlmRev::Exam set.seed(1981) b<-bootstrap(model = schoolaverage, fn = fixef, type = "case", resample = c(TRUE, TRUE), B=100) # Only small number to save time normally >1000 boot_data<-as.data.frame(b$t)
require(skimr) skim(boot_data)
## Skim summary statistics ## n obs: 100 ## n variables: 3 ## ## ── Variable type:numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## variable missing complete n mean sd p0 p25 p50 p75 ## (Intercept) 0 100 100 -0.012 0.034 -0.12 -0.031 -0.017 0.0089 ## schavg 0 100 100 0.31 0.12 -0.16 0.23 0.32 0.38 ## standLRT 0 100 100 0.55 0.019 0.5 0.54 0.55 0.57 ## p100 hist ## 0.12 ▁▁▃▇▅▂▁▁ ## 0.65 ▁▁▁▆▇▆▂▁ ## 0.6 ▁▂▇▇▇▇▁▁
quantile(na.omit(boot_data$schavg), prob = c(0.025, 0.975))
## 2.5% 97.5% ## 0.09118986 0.54308444
A case resampling bootstrap procedure with 100 resamples showed that the effect of school average was robust (95%CI [.091, .543]).
Note that sometimes the bootstraps will not converge (it is possible that you would draw the same case over and over, especially in small clusters)
Next week we will look closer at some common experimental designs and how they may be cast into multilevel models.
Depending on time we will also cover what happens when you have common issues.
We will also make some (nicer) graphs.
Download the data from here
Build a model: model 1: pitch ~ politeness + sex + (1|subject) + ε . \(\epsilon\) does not have to be modelled explicitly, those are the residuals. politeness is called attitude in the dataset and pitch is called frequency. Sex is labelled gender.
Test this against a null model: pitch ~ (1|subject) + ε . Which one is the better fit?
Bootstrap the politeness effect from model 1.
Compare model 1 to this model pitch ~ politeness + sex + (1|subject) + (1|item) + ε . Note that items are called scenarios.
Make a model with random slopes for politeness for both subjects and items, have it include a fixed effect as well.
Compare this model with random slopes to a model with just random intercepts (but with the fixed effect).
Bootstrap the fixed effect.
Also check the reading list! (many more than listed here).
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press.
Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). London: Taylor & Francis.
Magnusson, K. (2015). Using R and lme/lmer to fit different two- and three-level longitudinal models http://rpsychologist.com/r-guide-longitudinal-lme-lmer
Nieuwenhuis, R. (2017). R-Sessions 16: Multilevel Model Specification (lme4) http://www.rensenieuwenhuis.nl/r-sessions-16-multilevel-model-specification-lme4/
Snijders, T. A. B., & Berkhof, J. (2008). Diagnostic Checks for Multilevel Models. In: Handbook of Multilevel Analysis (pp. 141–175). New York, NY: Springer New York. http://doi.org/10.1007/978-0-387-73186-5_3
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage Publications Limited.