2018-12-04 | disclaimer

PY0782: Advanced Quantitative research methods.

  • Last lecture: SEM II: Mediation.
  • Today: Multilevel: part I.

Goals (today / next week)

Multilevel models

Not: Make you a ‘multilevel’ expert. That would take an entire separate course! But you should be able to apply the basics and understand when you come across it.

Assignment

After today you should be able to complete the following sections for Assignment II:

Multilevel model. (Some bits still next week. Such as plotting it.)

Other packages.

MLwin, SPSS (to a degree - cannot do some more complex models (easily)), SAS, Stata, HLM.

R – many options.

Why multilevel?

‘Group’ data are everywhere… .

Independence although commonly assumed, often is not present. (Kruskal, 1988)

Some examples of applications… .

Children in classes in schools.

Employees in depts. in companies (in regions)

Individuals nested in neighbourhoods. Neighborhoods in districts/provinces. Countries nested in regions.

Individuals in couples/families.

Patients assigned to the same therapist - therapists nested in hospitals.

Stimuli in sets. Block designs.

Naming frenzy… .

Multilevel model

Random effects model

Mixed model

Random coefficient model

Hierarchical model

Simpson’s paradox / Ecological Fallacy (atomistic fallacy).

Very simply put: depending on the level of analysis you can get very different results.

Different levels of analysis.

Famous example. Robinson (1950) on literacy and migrants.

Graphically:

(Good) Reasons to opt for multilevel models.

Good reasons from here.

Correct inferences: Traditional multiple regression techniques treat the units of analysis as independent observations. If there is a hierarchical structure: standard errors of regression coefficients will (likely) be underestimated, leading to an overstatement of statistical significance.

Substantive interest in group effects: In many situations a key research question concerns the extent of grouping in individual outcomes, and the identification of ‘outlying’ groups. For example, in evaluations of school performance we might be interested in the ‘value-added’ school effects on pupil attainment. To achieve this: school-level residuals in a multilevel model adjusting for prior attainment.

Reasons to opt for multilevel models.

Estimating group effects simultaneously with the effects of group-level predictors: An alternative way to allow for group effects: include dummy variables for groups in a traditional (ordinary least squares) regression model. Such a model is called an analysis of variance or fixed effects model. In many cases there will be predictors defined at the group level, e.g. type of school (mixed vs. single sex). In a fixed effects model, the effects of group-level predictors are confounded with the effects of the group dummies: it is not possible to separate out effects due to observed and unobserved group characteristics. In a multilevel (random effects) model, the effects of both types of variable can be estimated.

Inference to a population of groups: In a multilevel model the groups in the sample are treated as a random sample from a population of groups. Using a fixed effects model, inferences cannot be made beyond the groups in the sample.

If you get bored: I actually wrote on this: monkeys / countries.

What multilevel models can and cannot do.

Read this.

They allow for more accurate estimation of effects (for example, you could predict what would happen for a new school or class or individual). However, easy to mistake for causal effects – i.e. the school is causing the effect of X,Y,Z.

Assumptions.

You’ll need to be aware of these. We will not (always) check them here… . It’s regression again… .

Do you remember the assumptions? Potential ways of checking them?

More here

Sample sizes.

You can read about it here and here.

It depends which level you are interested in and whether you care about fixed vs. random effect. You can consider Bayesian models which have better properties for smaller samples (but come with their own challenges).

lme4, nlme, MCMCglmm, brms.

R packages.

Lme4 – linear models

nlme – non-linear models. but can also do linear models.

MCMCglmm – Bayesian models.

brms – Bayesian Regression Models via Stan.

LME4 and nlme.

Main packages - it’s up to you to pick… . Note that the codes differ between those packages!

Example

The dependent variable is the standardized result of a student on a specific exam (“normexam”).

In estimating the score on the exam, two levels: student and school.

On each level, one explanatory variable is present. On individual level, we are taking into account the standardized score of the student on a LR-test (London Reading test, “standLRT”). On the school-level, we take into account the average intake-score (“schavg”).

More info: Goldstein, H., Rasbash, J., et al. (1993). A multilevel analysis of school examination results. Oxford Review of Education, 19, 425-433. And here

Multilevel model. (lme4)

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture10")
library(mlmRev) # contains data
library(lme4)
data<-mlmRev::Exam 
null_model<-lmer(normexam ~ 1 + (1 | school), data=Exam, REML=F)

Summary

sink("null_model.txt")
summary(null_model)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: normexam ~ 1 + (1 | school)
##    Data: Exam
## 
##      AIC      BIC   logLik deviance df.resid 
##  11016.6  11035.6  -5505.3  11010.6     4056 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9471 -0.6486  0.0117  0.6992  3.6578 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  school   (Intercept) 0.1686   0.4107  
##  Residual             0.8478   0.9207  
## Number of obs: 4059, groups:  school, 65
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -0.01317    0.05363  -0.246
sink()

REML vs. ML

We came across Maximum Likelihood (ML) before. Restricted Maximum Likelihood (REML) might be new. REML has better properties for estimating effects in small samples but does not allow for model comparison for fixed effects.

When comparing fixed effects we would use ML. We can quantify the evidence for each model. Remember that we can also rescale and assign AIC weights.

We can also average across models (have a look at the MuMin package, for example.)

nlme.

Same model as before but now in nlme.

require(nlme)
lme(fixed = normexam ~ 1, data = Exam,random = ~ 1 | school)
## Linear mixed-effects model fit by REML
##   Data: Exam 
##   Log-restricted-likelihood: -5507.327
##   Fixed: normexam ~ 1 
## (Intercept) 
## -0.01325213 
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:   0.4142457 0.9207376
## 
## Number of Observations: 4059
## Number of Groups: 65

Fixed predictor at individual level

Note that the effect of ‘standLRT’ is not varying between groups. We are only looking at the individual level. We allow schools to vary based on ‘normexam’ but not as a function of ‘standLRT’. So no random slopes just random intercepts.

Model random intercept + fixed effect.

fixed_pred<-lmer(normexam ~ standLRT + (1 | school), data=Exam, REML=F)
summary(fixed_pred)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: normexam ~ standLRT + (1 | school)
##    Data: Exam
## 
##      AIC      BIC   logLik deviance df.resid 
##   9365.2   9390.5  -4678.6   9357.2     4055 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7162 -0.6304  0.0287  0.6844  3.2680 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  school   (Intercept) 0.09213  0.3035  
##  Residual             0.56573  0.7522  
## Number of obs: 4059, groups:  school, 65
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 0.002391   0.040023    0.06
## standLRT    0.563371   0.012465   45.20
## 
## Correlation of Fixed Effects:
##          (Intr)
## standLRT 0.008

Where are my p values?

You can see t values but not p values.

Read more about this.

Inference best done in other ways… .

Model comparison.

When comparing FIXED effects we rely on ML. (The \(\chi^2\) is known as the likelihood ratio test).

anova(null_model, fixed_pred)
## Data: Exam
## Models:
## null_model: normexam ~ 1 + (1 | school)
## fixed_pred: normexam ~ standLRT + (1 | school)
##            Df     AIC     BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## null_model  3 11016.6 11035.6 -5505.3  11010.6                         
## fixed_pred  4  9365.2  9390.5 -4678.6   9357.2 1653.4      1  < 2.2e-16
##               
## null_model    
## fixed_pred ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sample write up.

A likelihood ratio test suggested that a model with a fixed effect for the London Reading test was a better fit, as compared to a null model (\(\chi^2\)= 1653.4, \(p\) <.0001).

Model comparison random effects.

With caution (!), we can use REML to compare RANDOM effect structures. Does it matter when you allow the effect of standLRT to vary between schools?

In the second model, a model with a random intercept on individual level and a predictor that is allowed to vary between groups. In other words, what happens if we allow the effect of the London Reading Test to vary between schools: in some schools the effect of that test could be larger and in others it could be smaller.

Model comparison II

randomslope<- lmer(normexam ~ 1 + (standLRT | school), data=Exam)
null_reml <- lmer(normexam ~ 1 + (1 | school), data=Exam)
anova(null_reml,randomslope, refit=F) # Otherwise it uses ML.
## Data: Exam
## Models:
## null_reml: normexam ~ 1 + (1 | school)
## randomslope: normexam ~ 1 + (standLRT | school)
##             Df   AIC     BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## null_reml    3 11021 11039.6 -5507.3    11015                             
## randomslope  5  9490  9521.5 -4740.0     9480 1534.7      2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random Slope, summary

summary(randomslope)
## Linear mixed model fit by REML ['lmerMod']
## Formula: normexam ~ 1 + (standLRT | school)
##    Data: Exam
## 
## REML criterion at convergence: 9480
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8545 -0.6314  0.0424  0.6854  3.5188 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  school   (Intercept) 0.2153   0.4640       
##           standLRT    0.3180   0.5639   0.81
##  Residual             0.5531   0.7437       
## Number of obs: 4059, groups:  school, 65
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -0.36825    0.03769   -9.77

Another recommendation (conditional AIC).

conditional AIC. This similarly suggests that a random slope model is a better fit. (Have a look at cAIC4, it can also do bootstraps if you would need it.)

require(cAIC4)
cAIC(null_reml)$caic
## [1] 10909.74
cAIC(randomslope)$caic
## [1] 9233.549

plot random intercept

plot(ranef(fixed_pred))
## $school

plot random slope

plot(ranef(randomslope))
## $school

Model with random slope AND fixed effect.

This model has both a random slope at school level for ‘standLRT’ and a fixed effect at the individual level. This model allows the effect to vary as a function of school but still allows us to examine it at the individual level as well.

Model with random slope AND fixed effect.

standLRT<-lmer(normexam ~ standLRT + (standLRT | school), data=Exam, REML=F)
sink(file = 'standLRT.txt')
summary(standLRT)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: normexam ~ standLRT + (standLRT | school)
##    Data: Exam
## 
##      AIC      BIC   logLik deviance df.resid 
##   9328.9   9366.7  -4658.4   9316.9     4053 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8313 -0.6325  0.0340  0.6832  3.4561 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  school   (Intercept) 0.09045  0.3007       
##           standLRT    0.01454  0.1206   0.50
##  Residual             0.55366  0.7441       
## Number of obs: 4059, groups:  school, 65
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -0.01150    0.03978  -0.289
## standLRT     0.55673    0.01994  27.925
## 
## Correlation of Fixed Effects:
##          (Intr)
## standLRT 0.365
sink()

Group variables.

It is possible to enter variables on group level as well. Here, we will add a predictor that indicates the size of the school. The lmer-function needs this variable to be of the same length as variables on individual length. In other words: for every unit on the lowest level, the variable indicating the group level value (here: the average score on the intake-test for every school)

Model.

schoolaverage<-lmer(normexam ~ standLRT + schavg + (1 + standLRT | school), data=Exam, REML=F)
sink(file = 'schoolaverage.txt')
summary(schoolaverage)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: normexam ~ standLRT + schavg + (1 + standLRT | school)
##    Data: Exam
## 
##      AIC      BIC   logLik deviance df.resid 
##   9324.4   9368.6  -4655.2   9310.4     4052 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8279 -0.6313  0.0337  0.6844  3.4370 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  school   (Intercept) 0.07448  0.2729       
##           standLRT    0.01489  0.1220   0.38
##  Residual             0.55362  0.7441       
## Number of obs: 4059, groups:  school, 65
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -0.00124    0.03668  -0.034
## standLRT     0.55239    0.02018  27.376
## schavg       0.29476    0.10559   2.792
## 
## Correlation of Fixed Effects:
##          (Intr) stnLRT
## standLRT  0.268       
## schavg    0.089 -0.087
sink()

Export some of the results.

(Thomas needs to show code… ).

require(stargazer)
stargazer(null_model,standLRT,schoolaverage, type="html", dep.var.labels = c("Scores"), covariate.labels=c("London reading test", "School average"), style= "demography", out= "multilevel models.html", header=F)
## 
## <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3">Scores</td></tr>
## <tr><td style="text-align:left"></td><td>Model 1</td><td>Model 2</td><td>Model 3</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">London reading test</td><td></td><td>0.557<sup>***</sup></td><td>0.552<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td></td><td>(0.020)</td><td>(0.020)</td></tr>
## <tr><td style="text-align:left">School average</td><td></td><td></td><td>0.295<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td>(0.106)</td></tr>
## <tr><td style="text-align:left">Constant</td><td>-0.013</td><td>-0.012</td><td>-0.001</td></tr>
## <tr><td style="text-align:left"></td><td>(0.054)</td><td>(0.040)</td><td>(0.037)</td></tr>
## <tr><td style="text-align:left"><em>N</em></td><td>4,059</td><td>4,059</td><td>4,059</td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td>-5,505.324</td><td>-4,658.435</td><td>-4,655.215</td></tr>
## <tr><td style="text-align:left">AIC</td><td>11,016.650</td><td>9,328.871</td><td>9,324.429</td></tr>
## <tr><td style="text-align:left">BIC</td><td>11,035.580</td><td>9,366.723</td><td>9,368.590</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td colspan="4" style="text-align:left"><sup>*</sup>p < .05; <sup>**</sup>p < .01; <sup>***</sup>p < .001</td></tr>
## </table>

Try it yourself.

Load ‘ScotsSec’ form the mlmRev package.

Scores attained by 3435 Scottish secondary school students on a standardized test taken at age 16. Both the primary school and the secondary school that the student attended have been recorded.

Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education, 5, 97-121.

Try it yourself.

Test a null model (random intercept at ‘second’) and ‘attain’ as dependent.

Build a model with a fixed effect of ‘verbal’ and a random intercept (second) and ‘attain’ as dependent. Compare it to the null model. Export the results.

Build a model with a school level (second) a random intercept + slope (verbal) and ‘attain’ as dependent. Compare it to the null model.

Make a plot.

Significance check for random effect?

require(lmerTest)
ranova(fixed_pred) # rand() is alternative command but leads to issues.
## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## normexam ~ standLRT + (1 | school)
##              npar  logLik    AIC    LRT Df Pr(>Chisq)    
## <none>          4 -4678.6 9365.2                         
## (1 | school)    3 -4880.3 9766.5 403.27  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Some diagnostic checks.

Check this. You can also sj.Plot.

Are the residuals normal and linear? We are looking for straight lines… .

require(DHARMa)
# 250 is the default.
simulationOutput <- simulateResiduals(fittedModel = fixed_pred, n = 250)
plotSimulatedResiduals(simulationOutput = simulationOutput)

Formal test.

We can also obtain a formal test.

testUniformity(simulationOutput = simulationOutput)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.016165, p-value = 0.2394
## alternative hypothesis: two-sided

plot homogeneity check.

Check our residuals. No real funnel. Homogeneity OK.

plot(fixed_pred)

Some tips.

Centering: sometimes you are interested in cross-level interactions. In such a case, it is sensible to grand mean center your variables first.

Statistical power. Depending on the effect you are looking for the sample sizes you’ll need will vary. But tends to be quite ‘data hungry’. You can simulate your power based on here.

Effect size measures. These are generally not easy to obtain but have a look at the intra-class correlation coefficient here or here

More levels!

Not evaluated here. But often (as in your assignment) you will have nested data.

# This would nest classes in schools.
# lme 4
(1|school/class)
# nlme: Classes in schools.
random = ~1 |school/class
# Suppose that you have two memberships which are non-nested
# lme4
(1 | school) + (1 | doctor)
# Harder to do in nlme - find out on your own it involves lists :)

Just to give you an idea.

require(lmerTest)
plot(ranef(null_model))
## $school

Random slope… .

plot(ranef(randomslope))
## $school

Bootstrap a fixed effect.

require(lmeresampler)
require(boot)
require(lme4)
require(mlmRev)
Exam<-mlmRev::Exam 
set.seed(1981)
b<-bootstrap(model = schoolaverage, fn = fixef, type = "case", resample = c(TRUE, TRUE), B=100)
# Only small number to save time normally >1000
boot_data<-as.data.frame(b$t)

Results.

require(skimr)
skim(boot_data)
## Skim summary statistics
##  n obs: 100 
##  n variables: 3 
## 
## ── Variable type:numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##     variable missing complete   n   mean    sd    p0    p25    p50    p75
##  (Intercept)       0      100 100 -0.012 0.034 -0.12 -0.031 -0.017 0.0089
##       schavg       0      100 100  0.31  0.12  -0.16  0.23   0.32  0.38  
##     standLRT       0      100 100  0.55  0.019  0.5   0.54   0.55  0.57  
##  p100     hist
##  0.12 ▁▁▃▇▅▂▁▁
##  0.65 ▁▁▁▆▇▆▂▁
##  0.6  ▁▂▇▇▇▇▁▁

Results (cont’d).

quantile(na.omit(boot_data$schavg), prob = c(0.025, 0.975))
##       2.5%      97.5% 
## 0.09118986 0.54308444

Sample write up.

A case resampling bootstrap procedure with 100 resamples showed that the effect of school average was robust (95%CI [.091, .543]).

Note that sometimes the bootstraps will not converge (it is possible that you would draw the same case over and over, especially in small clusters)

Next week.

Next week we will look closer at some common experimental designs and how they may be cast into multilevel models.

Depending on time we will also cover what happens when you have common issues.

We will also make some (nicer) graphs.

Exercise

Download the data from here

Build a model: model 1: pitch ~ politeness + sex + (1|subject) + ε . \(\epsilon\) does not have to be modelled explicitly, those are the residuals. politeness is called attitude in the dataset and pitch is called frequency. Sex is labelled gender.

Test this against a null model: pitch ~ (1|subject) + ε . Which one is the better fit?

Bootstrap the politeness effect from model 1.

Compare model 1 to this model pitch ~ politeness + sex + (1|subject) + (1|item) + ε . Note that items are called scenarios.

Exercise (cont’d)

Make a model with random slopes for politeness for both subjects and items, have it include a fixed effect as well.

Compare this model with random slopes to a model with just random intercepts (but with the fixed effect).

Bootstrap the fixed effect.

References (and further reading.)

Also check the reading list! (many more than listed here).

  • Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press.

  • Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). London: Taylor & Francis.

  • Magnusson, K. (2015). Using R and lme/lmer to fit different two- and three-level longitudinal models http://rpsychologist.com/r-guide-longitudinal-lme-lmer

  • Nieuwenhuis, R. (2017). R-Sessions 16: Multilevel Model Specification (lme4) http://www.rensenieuwenhuis.nl/r-sessions-16-multilevel-model-specification-lme4/

  • Snijders, T. A. B., & Berkhof, J. (2008). Diagnostic Checks for Multilevel Models. In: Handbook of Multilevel Analysis (pp. 141–175). New York, NY: Springer New York. http://doi.org/10.1007/978-0-387-73186-5_3

  • Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London: Sage Publications Limited.