Download the data from here
Build a model: model 1: pitch ~ politeness + sex + (1|subject) + \(\epsilon\) . \(\epsilon\) does not have to be modeled explicitly, those are the residuals. politeness is called attitude in the dataset and pitch is called frequency. Sex is labelled gender.
Test this against a null model: pitch ~ (1|subject) + \(\epsilon\) . Which one is the better fit?
Bootstrap the politeness effect from model 1.
Compare model 1 to this model pitch ~ politeness + sex + (1|subject) + (1|item) + \(\epsilon\) . Note that items are called scenarios.
Make a model with random slopes for politeness for both subjects and items, have it include a fixed effect as well.
Compare this model with random slopes to a model with just random intercepts (but with the fixed effect).
Bootstrap the fixed effect.
## Loading required package: lme4
## Loading required package: Matrix
# We don't explicitly fit ML because anova will refit. The warning tells us this.
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
null_model<- lmer(frequency ~ 1 + (1|subject), data=politeness)
anova(null_model,fixed_effect)
## refitting model(s) with ML (instead of REML)
A likelihood ratio test demonstrated that a model with a fixed effect for gender and politeness was a better fit, as compared to a null model (\(\chi^2\)= 20.91, p < .0001).
You were not asked to do this but we can check the diagnostics.
## Loading required package: DHARMa
## This is DHARMa 0.4.7. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
# 250 is the default.
simulationOutput <- simulateResiduals(fittedModel = fixed_effect, n = 250)
plotSimulatedResiduals(simulationOutput = simulationOutput)
## plotSimulatedResiduals is deprecated, please switch your code to simply using the plot() function
This looks fairly OK, we do have some gap in ‘the middle’ for the second plot. To some degree that is to be expected as we have categorical predictors.
Unsurprisingly, the K-S test allows us to maintain the assumption of normality.
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.069398, p-value = 0.8189
## alternative hypothesis: two-sided
We can also plot what is going on. Blue line suggests some non-linearity but close to the line.
Plot of residuals.
Diagnostics. Note that the 4th plot only looks at fixed effects.
## [[1]]
##
## [[2]]
## [[2]]$subject
##
##
## [[3]]
##
## [[4]]
The below previously did not work well with “case” as we will draw samples which will only have one category in the predictors. So we switched to “parametric” (which is also faster, so we’ll up the numbers). However, in a more recent version we can (you might need to download the newest version via ‘remotes’).
Note that for some of you a couple of these models will have convergence errors but that’s to be expected with 10,000 bootstraps. We will treat these as noise. One would typically report how many failed to converge (if it is a large number). In my case 9,633 were returned from the 10,000 requested.
Note that this will take some time.
require(lmeresampler)
set.seed(1984)
b<-bootstrap(model = fixed_effect, .f = fixef, type="case", resample = c(TRUE,TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)
A case bootstrap approach showed that the effect of politeness was robust (95%CI [-33.09, -4.14] based on 9,633 bootstraps).
## Loading required package: skimr
Name | boot_data |
Number of rows | 9633 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0 | 1 | 256.63 | 8.85 | 211.53 | 251.65 | 257.23 | 262.40 | 286.66 | ▁▁▇▇▁ |
attitudepol | 0 | 1 | -19.36 | 7.29 | -46.85 | -24.31 | -19.64 | -14.79 | 14.07 | ▁▅▇▂▁ |
genderM | 0 | 1 | -108.15 | 19.15 | -163.16 | -120.10 | -108.80 | -94.84 | -47.26 | ▁▅▇▃▁ |
## 2.5% 97.5%
## -33.088800 -4.144622
An alternative via ‘bootmlm’. You will need to install it first:
remotes::install_github("marklhc/bootmlm")
. This assumes
you have the remotes
package already installed
We use a helper function to get the fixed effects:
## Loading required package: bootmlm
Note that this overrides! (also note some warning messages!)
## 644 message(s) : boundary (singular) fit: see help('isSingular')
There are 320 missing.
Name | as.data.frame(boot_data$t… |
Number of rows | 10000 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
V1 | 320 | 0.97 | 256.63 | 6.95 | 234.95 | 252.51 | 257.54 | 261.83 | 269.43 | ▁▁▇▇▅ |
V2 | 320 | 0.97 | -19.39 | 3.68 | -27.36 | -22.01 | -19.49 | -17.06 | -4.79 | ▃▇▆▂▁ |
V3 | 320 | 0.97 | -108.33 | 18.93 | -156.01 | -120.83 | -107.67 | -94.50 | -63.06 | ▂▅▇▆▂ |
## 2.5% 97.5%
## -25.94524 -11.99849
A sample write-up:
Using the ‘bootmlm’ package, we followed a case bootstrap approach with 10,000 samples, of which 9,680 converged. This showed that the effect of politeness was robust (95%CI [-25.95, -12.00]).
require(lme4)
fixed_effect <- lmer(frequency ~ attitude + gender + (1|subject), data=politeness)
two_rand_intercept <- lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness)
anova(fixed_effect,two_rand_intercept, refit=F)
The model with two random intercepts is a superior fit.
We can now also compare all models if we’d like.
The model with two random slopes is not a superior fit to the two random intercept model in terms of fit.
two_rand_slopes<- lmer(frequency ~ attitude + gender + (attitude|subject) + (attitude|scenario), data=politeness)
## boundary (singular) fit: see help('isSingular')
Again we opt for a case bootstrap. This will generate lots of errors (for some of you). One of the downsides of keeping it ‘maximal’ is that you will run into lots of convergence errors… . Anyway, this is to be expected when bootstrapping.
require(lmeresampler)
set.seed(1987)
b<-bootstrap(model = two_rand_slopes, .f = fixef, type="case", resample = c(TRUE,TRUE,TRUE), B=10000)
boot_data<-as.data.frame(b$replicates)
Name | boot_data |
Number of rows | 9677 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 0 | 1 | 259.77 | 25.56 | -156.27 | 251.21 | 259.97 | 268.73 | 727.10 | ▁▁▇▁▁ |
attitudepol | 0 | 1 | -19.70 | 9.49 | -69.43 | -26.10 | -20.19 | -13.90 | 41.93 | ▁▃▇▁▁ |
genderM | 0 | 1 | -113.73 | 46.01 | -824.63 | -136.32 | -113.96 | -91.66 | 509.88 | ▁▁▇▁▁ |
## 2.5% 97.5%
## -36.9675472 -0.2575359
A case bootstrap approach with 9677 samples, out of 10,000 requested, showed that the effect of politeness was robust (95%CI [-36.97, -0.26]).
In conclusion, across a range of models we consistently find evidence for the fixed effect of politeness.
Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. http://arxiv.org/pdf/1308.5499.pdf
THE END!