Exercise 10 instructions.

Download the data from here

Build a model: model 1: pitch ~ politeness + sex + (1|subject) + \(\epsilon\) . \(\epsilon\) does not have to be modeled explicitly, those are the residuals. politeness is called attitude in the dataset and pitch is called frequency. Sex is labelled gender.

Test this against a null model: pitch ~ (1|subject) + \(\epsilon\) . Which one is the better fit?

Bootstrap the politeness effect from model 1.

Compare model 1 to this model pitch ~ politeness + sex + (1|subject) + (1|item) + \(\epsilon\) . Note that items are called scenarios.

Make a model with random slopes for politeness for both subjects and items, have it include a fixed effect as well.

Compare this model with random slopes to a model with just random intercepts (but with the fixed effect).

Bootstrap the fixed effect.

Load the data.

politeness<-read.csv("http://www.bodowinter.com/tutorial/politeness_data.csv")

Compare fixed effect to a null model.

require(lme4)
## Loading required package: lme4
## Loading required package: Matrix
# We don't explicitly fit ML because anova will refit. The warning tells us this.
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
null_model<- lmer(frequency ~ 1 + (1|subject), data=politeness)
anova(null_model,fixed_effect)
## refitting model(s) with ML (instead of REML)
## Data: politeness
## Models:
## null_model: frequency ~ 1 + (1 | subject)
## fixed_effect: frequency ~ attitude + gender + (1 | subject)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## null_model      3 833.25 840.51 -413.62   827.25                         
## fixed_effect    5 816.34 828.43 -403.17   806.34 20.912  2  2.877e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A likelihood ratio test demonstrated that a model with a fixed effect for gender and politeness was a better fit, as compared to a null model (\(\chi^2\)= 20.91, p < .0001).

Diagnostics

You were not asked to do this but we can check the diagnostics.

require(DHARMa)
## Loading required package: DHARMa
## This is DHARMa 0.4.3. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa') Note: Syntax of plotResiduals has changed in 0.3.0, see ?plotResiduals for details
# 250 is the default.
simulationOutput <- simulateResiduals(fittedModel = fixed_effect, n = 250)
plotSimulatedResiduals(simulationOutput = simulationOutput)
## plotSimulatedResiduals is deprecated, please switch your code to simply using the plot() function

This looks fairly OK, we do have some gap in ‘the middle’ for the second plot. To some degree that is to be expected as we have categorical predictors.

Unsurprisingly, the K-S test allows us to maintain the assumption of normality.

testUniformity(simulationOutput = simulationOutput)

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.069398, p-value = 0.8189
## alternative hypothesis: two-sided

We can also plot what is going on. Blue line suggests some non-linearity but close to the line.

require(sjPlot)
plot_model(fixed_effect, type = "slope")

Plot of residuals.

require(sjPlot)
plot_model(fixed_effect, type = "resid")

Diagnostics. Note that the 4th plot only looks at fixed effects.

require(sjPlot)
plot_model(fixed_effect, type = "diag")
## [[1]]

## 
## [[2]]
## [[2]]$subject

## 
## 
## [[3]]

## 
## [[4]]

Bootstrap the politeness effect.

The below does not work with “case” as we will draw samples which will only have one category in the predictors. So we switch to “parametric” (which is also faster, so we’ll up the numbers). Note that for some of you a couple of these models will have convergence errors but that’s to be expected with 10,000 bootstraps. We will treat these as noise. One would typically report how many failed to converge (if it is a large number).

require(lmeresampler)
set.seed(1984)
b<-bootstrap(model = fixed_effect, .f = fixef, type="parametric", resample = c(TRUE, TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)

A parametric bootstrap approach showed that the effect of politeness was robust (95%CI [-32.27, -7.14]).

require(skimr)
## Loading required package: skimr
skim(boot_data)
Data summary
Name boot_data
Number of rows 10000
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
(Intercept) 0 1 256.97 15.15 201.55 246.68 257.00 266.99 321.00 ▁▅▇▂▁
attitudepol 0 1 -19.49 6.45 -45.06 -23.82 -19.44 -15.18 4.99 ▁▂▇▃▁
genderM 0 1 -108.64 20.93 -187.67 -122.92 -108.42 -94.18 -35.54 ▁▃▇▅▁
quantile(na.omit(boot_data$attitudepol), prob = c(0.025, 0.975))
##       2.5%      97.5% 
## -32.270876  -7.140121

Report the models.

require(stargazer)
star.1<-stargazer(null_model,fixed_effect, type="html", dep.var.labels = c("Frequency"), covariate.labels=c("Politeness", "Gender"), style= "asr", align = T, title="Amazing table", out= "amazing.html")

model with scenarios.

require(lme4)
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
two_rand_intercept <- lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness)
anova(fixed_effect,two_rand_intercept, refit=F)
## Data: politeness
## Models:
## fixed_effect: frequency ~ attitude + gender + (1 | subject)
## two_rand_intercept: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
##                    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## fixed_effect          5 796.74 808.84 -393.37   786.74                         
## two_rand_intercept    6 787.45 801.97 -387.73   775.45 11.289  1  0.0007796 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model with two random intercepts is a superior fit.

Random slopes.

We can now also compare all models if we’d like.

The model with two random slopes is not a superior fit to the two random intercept model in terms of fit.

two_rand_slopes<- lmer(frequency ~ attitude + gender + (attitude|subject) + (attitude|scenario), data=politeness)
## boundary (singular) fit: see ?isSingular
anova(null_model,fixed_effect,two_rand_intercept,two_rand_slopes, refit=F)
## Data: politeness
## Models:
## null_model: frequency ~ 1 + (1 | subject)
## fixed_effect: frequency ~ attitude + gender + (1 | subject)
## two_rand_intercept: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## two_rand_slopes: frequency ~ attitude + gender + (attitude | subject) + (attitude | scenario)
##                    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## null_model            3 825.00 832.25 -409.50   819.00                         
## fixed_effect          5 796.74 808.84 -393.37   786.74 32.252  2   9.92e-08 ***
## two_rand_intercept    6 787.45 801.97 -387.73   775.45 11.289  1  0.0007796 ***
## two_rand_slopes      10 795.08 819.27 -387.54   775.08  0.377  4  0.9843163    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bootstrap the fixed effect.

Again we opt for a parametric bootstrap. This will generate lots of errors (for some of you). One of the downsides of keeping it ‘maximal’ is that you will run into lots of convergence errors… . Anyway, this is to be expected when bootstrapping.

require(lmeresampler)
set.seed(1981)
b<-bootstrap(model = two_rand_slopes, .f = fixef, type="parametric", resample = c(TRUE, TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)
require(skimr)
skim(boot_data)
Data summary
Name boot_data
Number of rows 10000
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
(Intercept) 0 1 258.09 16.42 196.07 247.10 258.10 269.06 319.25 ▁▃▇▃▁
attitudepol 0 1 -19.69 6.36 -45.87 -23.90 -19.67 -15.40 4.85 ▁▂▇▃▁
genderM 0 1 -111.13 22.24 -212.34 -126.16 -111.45 -96.05 -28.66 ▁▂▇▅▁
quantile(na.omit(boot_data$attitudepol), prob = c(0.025, 0.975))
##       2.5%      97.5% 
## -32.387858  -7.206157

A parametric bootstrap approach again showed that the effect of politeness was robust (95%CI [-32.39, -7.21]).

In conclusion, across a range of models we consistently find evidence for the fixed effect of politeness.

References

Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. http://arxiv.org/pdf/1308.5499.pdf

THE END!