Exercise 10 instructions.

Download the data from here

Build a model: model 1: pitch ~ politeness + sex + (1|subject) + \(\epsilon\) . \(\epsilon\) does not have to be modeled explicitly, those are the residuals. politeness is called attitude in the dataset and pitch is called frequency. Sex is labelled gender.

Test this against a null model: pitch ~ (1|subject) + \(\epsilon\) . Which one is the better fit?

Bootstrap the politeness effect from model 1.

Compare model 1 to this model pitch ~ politeness + sex + (1|subject) + (1|item) + \(\epsilon\) . Note that items are called scenarios.

Make a model with random slopes for politeness for both subjects and items, have it include a fixed effect as well.

Compare this model with random slopes to a model with just random intercepts (but with the fixed effect).

Bootstrap the fixed effect.

Load the data.

politeness<-read.csv("http://www.bodowinter.com/tutorial/politeness_data.csv")

Compare fixed effect to a null model.

require(lme4)
## Loading required package: lme4
## Loading required package: Matrix
# We don't explicitly fit ML because anova will refit. The warning tells us this.
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
null_model<- lmer(frequency ~ 1 + (1|subject), data=politeness)
anova(null_model,fixed_effect)
## refitting model(s) with ML (instead of REML)

A likelihood ratio test demonstrated that a model with a fixed effect for gender and politeness was a better fit, as compared to a null model (\(\chi^2\)= 20.91, p < .0001).

Diagnostics

You were not asked to do this but we can check the diagnostics.

require(DHARMa)
## Loading required package: DHARMa
## This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
# 250 is the default.
simulationOutput <- simulateResiduals(fittedModel = fixed_effect, n = 250)
plotSimulatedResiduals(simulationOutput = simulationOutput)
## plotSimulatedResiduals is deprecated, please switch your code to simply using the plot() function

This looks fairly OK, we do have some gap in ‘the middle’ for the second plot. To some degree that is to be expected as we have categorical predictors.

Unsurprisingly, the K-S test allows us to maintain the assumption of normality.

testUniformity(simulationOutput = simulationOutput)

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.069398, p-value = 0.8189
## alternative hypothesis: two-sided

We can also plot what is going on. Blue line suggests some non-linearity but close to the line.

require(sjPlot)
plot_model(fixed_effect, type = "slope")

Plot of residuals.

require(sjPlot)
plot_model(fixed_effect, type = "resid")

Diagnostics. Note that the 4th plot only looks at fixed effects.

require(sjPlot)
plot_model(fixed_effect, type = "diag")
## [[1]]

## 
## [[2]]
## [[2]]$subject

## 
## 
## [[3]]

## 
## [[4]]

Bootstrap the politeness effect.

The below previously did not work well with “case” as we will draw samples which will only have one category in the predictors. So we switched to “parametric” (which is also faster, so we’ll up the numbers). However, in a more recent version we can (you might need to download the newest version via ‘remotes’).

Note that for some of you a couple of these models will have convergence errors but that’s to be expected with 10,000 bootstraps. We will treat these as noise. One would typically report how many failed to converge (if it is a large number). In my case 9,633 were returned from the 10,000 requested.

Note that this will take some time.

require(lmeresampler)
set.seed(1984)
b<-bootstrap(model = fixed_effect, .f = fixef, type="case", resample = c(TRUE,TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)

A case bootstrap approach showed that the effect of politeness was robust (95%CI [-33.09, -4.14] based on 9,633 bootstraps).

require(skimr)
## Loading required package: skimr
skim(boot_data)
Data summary
Name boot_data
Number of rows 9633
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
(Intercept) 0 1 256.63 8.85 211.53 251.65 257.23 262.40 286.66 ▁▁▇▇▁
attitudepol 0 1 -19.36 7.29 -46.85 -24.31 -19.64 -14.79 14.07 ▁▅▇▂▁
genderM 0 1 -108.15 19.15 -163.16 -120.10 -108.80 -94.84 -47.26 ▁▅▇▃▁
quantile(na.omit(boot_data$attitudepol), prob = c(0.025, 0.975))
##       2.5%      97.5% 
## -33.088800  -4.144622

An alternative via ‘bootmlm’. You will need to install it first: remotes::install_github("marklhc/bootmlm") . This assumes you have the remotes package already installed

We use a helper function to get the fixed effects:

require(bootmlm)
## Loading required package: bootmlm
mySumm <- function(x) {
  c(getME(x, "beta"))
}

Note that this overrides! (also note some warning messages!)

set.seed(1234)
boot_data <- bootstrap_mer(fixed_effect, mySumm, type = "case", nsim = 10000)
## 644 message(s) : boundary (singular) fit: see help('isSingular')

There are 320 missing.

require(skimr)
skim(as.data.frame(boot_data$t)) 
Data summary
Name as.data.frame(boot_data$t…
Number of rows 10000
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
V1 320 0.97 256.63 6.95 234.95 252.51 257.54 261.83 269.43 ▁▁▇▇▅
V2 320 0.97 -19.39 3.68 -27.36 -22.01 -19.49 -17.06 -4.79 ▃▇▆▂▁
V3 320 0.97 -108.33 18.93 -156.01 -120.83 -107.67 -94.50 -63.06 ▂▅▇▆▂
boot_result<-as.data.frame(boot_data$t)
quantile(na.omit(boot_result$V2), prob = c(0.025, 0.975))
##      2.5%     97.5% 
## -25.94524 -11.99849

A sample write-up:

Using the ‘bootmlm’ package, we followed a case bootstrap approach with 10,000 samples, of which 9,680 converged. This showed that the effect of politeness was robust (95%CI [-25.95, -12.00]).

Report the models.

require(stargazer)
star.1<-stargazer(null_model,fixed_effect, type="html", dep.var.labels = c("Frequency"), covariate.labels=c("Politeness", "Gender"), style= "asr", align = T, title="Amazing table", out= "amazing.html")

model with scenarios.

require(lme4)
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
two_rand_intercept <- lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness)
anova(fixed_effect,two_rand_intercept, refit=F)

The model with two random intercepts is a superior fit.

Random slopes.

We can now also compare all models if we’d like.

The model with two random slopes is not a superior fit to the two random intercept model in terms of fit.

two_rand_slopes<- lmer(frequency ~ attitude + gender + (attitude|subject) + (attitude|scenario), data=politeness)
## boundary (singular) fit: see help('isSingular')
anova(null_model,fixed_effect,two_rand_intercept,two_rand_slopes, refit=F)

Bootstrap the fixed effect.

Again we opt for a case bootstrap. This will generate lots of errors (for some of you). One of the downsides of keeping it ‘maximal’ is that you will run into lots of convergence errors… . Anyway, this is to be expected when bootstrapping.

require(lmeresampler)
set.seed(1987)
b<-bootstrap(model = two_rand_slopes, .f = fixef, type="case", resample = c(TRUE,TRUE,TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)
require(skimr)
skim(boot_data)
Data summary
Name boot_data
Number of rows 9677
Number of columns 3
_______________________
Column type frequency:
numeric 3
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
(Intercept) 0 1 259.77 25.56 -156.27 251.21 259.97 268.73 727.10 ▁▁▇▁▁
attitudepol 0 1 -19.70 9.49 -69.43 -26.10 -20.19 -13.90 41.93 ▁▃▇▁▁
genderM 0 1 -113.73 46.01 -824.63 -136.32 -113.96 -91.66 509.88 ▁▁▇▁▁
quantile(na.omit(boot_data$attitudepol), prob = c(0.025, 0.975))
##        2.5%       97.5% 
## -36.9675472  -0.2575359

A case bootstrap approach with 9677 samples, out of 10,000 requested, showed that the effect of politeness was robust (95%CI [-36.97, -0.26]).

In conclusion, across a range of models we consistently find evidence for the fixed effect of politeness.

References

Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. http://arxiv.org/pdf/1308.5499.pdf

THE END!