Exercise 5 instructions.

Download the data thechase_R.xlsx . This file contains coding for 50 episodes from ITV’s The Chase (coded by yours Truly). Read the data (use the ‘readxl’ package for example). The codes are largely self-explanatory.

Test the assumptions for a 2 (gender_contestant) x 5 (Opponent) ANOVA on ‘High’ (this is the highest offer made by the chaser).

Conduct the 2 x 5 ANOVA. Calculate the post-hoc contrasts. Report as you would do in a paper.

Make a (beautiful) violin plot.

Conduct a 2 (gender_contestant) x 2 (age) ANOVA on ‘chosen’ (the amount chosen by the contestant.).

Make a plot showing the interaction (see the examples from the slides or check ggplot2).

BONUS: Load the salaries dataset, test the interaction effect for yrs.service x yrs.since.Ph.D. on monthly salary.
Calculate Johnson-Neyman interval range.

BONUS: use vegan to conduct a permutation analysis. for the 2 x 5 ANOVA you conducted above.

BONUS: use the party package to model ‘high’. (Reduce the dataset via ‘dplyr’, keep a sensible number of variables).

The Chase data.

Load the data and get some descriptives.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture5")
require(readxl)

## Loading required package: readxl

chase<-read_xlsx("thechase_R.xlsx")
require(skimr)

## Loading required package: skimr

skim(chase)

Data summary
Name	chase
Number of rows	200
Number of columns	15
_______________________
Column type frequency:
character	7
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Date	1	10	10	30
Time	1	2	2	2
Opponent	1	4	5	5
gender_contestant	1	4	6	2
Chosen_2	1	3	6	3
success_player	1	2	3	2
Outcome_episode	1	3	4	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
ID	0	1.00	25.50	14.47	1	13.00	25.5	38.00	50	▇▇▇▇▇
Number	0	1.00	2.50	1.12	1	1.75	2.5	3.25	4	▇▇▁▇▇
Age	0	1.00	42.77	14.45	18	31.00	42.0	54.00	79	▇▇▇▅▂
Earned	0	1.00	5150.00	1919.88	0	4000.00	5000.0	6000.00	10000	▂▆▇▃▁
Low	1	1.00	290.39	1909.96	-10000	140.00	1000.0	1000.00	3000	▁▁▁▂▇
High	0	1.00	33040.00	10598.11	10000	25000.00	31000.0	40000.00	65000	▃▇▆▃▁
Chosen	0	1.00	7806.00	10363.33	-5000	4000.00	5000.0	7000.00	55000	▇▁▁▁▁
steps_left	77	0.62	1.36	1.17	0	0.00	1.0	2.00	4	▇▆▆▅▁

Assumptions.

I have also looked at ‘Chosen’ rather than ‘High’.

Facetted by opponent.

Here is a graph facetted by opponent to explore normality.

require(ggplot2)

## Loading required package: ggplot2

require(ggthemes)

## Loading required package: ggthemes

plot_hist <- ggplot(chase, aes(x=Chosen)) 
plot_hist <- plot_hist+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~Opponent) + theme_gdocs(12) + xlab("Amount Chosen") + ylab("Count")
plot_hist

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Not great in terms of distribution!

How would you resolve it? (see below for the answer)

Incidentally the above was for chosen rather than high.

require(ggplot2)
require(ggthemes)
plot_hist <- ggplot(chase, aes(x=High)) 
plot_hist <- plot_hist+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~Opponent) + theme_gdocs(12) + xlab("Highest offer") + ylab("Count")
plot_hist

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Facetted by gender.

Here is one facetted by gender.

require(ggplot2)
require(ggthemes)
plot_hist2 <- ggplot(chase, aes(x=Chosen)) 
plot_hist2 <- plot_hist2+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~gender_contestant) + theme_gdocs(12) + xlab("Amount Chosen") + ylab("Count")
plot_hist2

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This similarly shows some issues. Long tail… .

Let us look at Highest offers. (note that it overrides previous codes.)

require(ggplot2)
require(ggthemes)
plot_hist2 <- ggplot(chase, aes(x=High)) 
plot_hist2 <- plot_hist2+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~gender_contestant) + theme_gdocs(12) + xlab("Highest offer") + ylab("Count")
plot_hist2

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Formal normality test.

Unsurprisingly these tests suggest violations for ‘Chosen’.

# other routes.
require(nortest)

## Loading required package: nortest

# Anderson-Darling test.
ad.test(chase$Chosen)

## 
##  Anderson-Darling normality test
## 
## data:  chase$Chosen
## A = 33.746, p-value < 2.2e-16

# Cramer-von Mises test.
cvm.test(chase$Chosen)

## Warning in cvm.test(chase$Chosen): p-value is smaller than 7.37e-10, cannot be
## computed more accurately

## 
##  Cramer-von Mises normality test
## 
## data:  chase$Chosen
## W = 6.6963, p-value = 7.37e-10

# transform, we need to add a constant to make every value positive. 
require(dplyr)

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

chase <- chase %>% mutate(Log_chosen=log10(Chosen+10000))
ad.test(chase$Log_chosen)

## 
##  Anderson-Darling normality test
## 
## data:  chase$Log_chosen
## A = 17.74, p-value < 2.2e-16

cvm.test(chase$Log_chosen)

## Warning in cvm.test(chase$Log_chosen): p-value is smaller than 7.37e-10, cannot
## be computed more accurately

## 
##  Cramer-von Mises normality test
## 
## data:  chase$Log_chosen
## W = 3.3068, p-value = 7.37e-10

This reduces non-normality (values went down) but still strong evidence for violations.

Let us look at some plots, to see what has happened.

require(ggplot2)
qqplot <- ggplot(chase, aes(sample = Chosen))
qqplot + stat_qq()

# log
qqplot_log <- ggplot(chase, aes(sample = Log_chosen))
qqplot_log + stat_qq()

The log-transformed data looks better but clearly one outlier. In conclusion, we have clear violations but we will tackle these issues in the ‘bonus’ section.

What happens to the highest offer.

# other routes but now for highest offer .
require(nortest)
# Anderson-Darling test.
ad.test(chase$High)

## 
##  Anderson-Darling normality test
## 
## data:  chase$High
## A = 1.4766, p-value = 0.0008027

# Cramer-von Mises test.
cvm.test(chase$High)

## 
##  Cramer-von Mises normality test
## 
## data:  chase$High
## W = 0.24083, p-value = 0.001648

Also some issues there… . Transforming it improves it somewhat but still non-normal according to the tests, but graphically it looks a lot better than ‘Chosen’.

require(dplyr)
chase <- chase %>% mutate(Log_high=log10(High))
ad.test(chase$Log_high)

## 
##  Anderson-Darling normality test
## 
## data:  chase$Log_high
## A = 1.3068, p-value = 0.002106

cvm.test(chase$Log_high)

## 
##  Cramer-von Mises normality test
## 
## data:  chase$Log_high
## W = 0.20094, p-value = 0.005163

Homogeneity test.

The assumption of homogeneity of variances is not violated for neither opponent nor gender of the contestant, respectively: F(4,195) = 2.23, p= .067 and F(1,198) = 0.95, p= .331.

Note that there is a trend with ‘Opponent’ (p= .067) but this is not really too worrisome.

require(car)

## Loading required package: car

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

leveneTest(Chosen ~ gender_contestant, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

leveneTest(Chosen ~ Opponent, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Incidentally, if you use the transformed data, then there is similarly no issue.

leveneTest(Log_chosen ~ gender_contestant, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

leveneTest(Log_chosen ~ Opponent, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

As an aside, note that in the current analyses we do not account for the fact that observations do not satisfy the assumption of independence. As you might have noticed, the data are nested within episodes! (The offers (and chosen offers) will depend based on previous players’ performance!). We will further discuss this when we cover multilevel models.

Similarly no issues with the high offers made, in terms of homogeneity of variance.

require(car)
leveneTest(High~ gender_contestant, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

leveneTest(High ~ Opponent, data=chase)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

2 x 5 ANOVA

Raw data ‘chosen’.

require(ez)

## Loading required package: ez

chase$contest_id<-seq(1:200) # need to make an ID.
ez<-ezANOVA(data=chase, dv=.(Chosen), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)

## Warning: Converting "contest_id" to factor for ANOVA.

## Warning: Converting "gender_contestant" to factor for ANOVA.

## Warning: Converting "Opponent" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

ez

## $ANOVA
##                       Effect DFn DFd        SSn         SSd         F
## 1                (Intercept)   1 190 8102271319 19616823829 78.475066
## 2          gender_contestant   1 190  199873750 19616823829  1.935890
## 3                   Opponent   4 190  928758248 19616823829  2.248887
## 4 gender_contestant:Opponent   4 190  506667493 19616823829  1.226840
##              p p<.05        ges
## 1 5.747493e-16     * 0.29229927
## 2 1.657424e-01       0.01008613
## 3 6.535730e-02       0.04520477
## 4 3.008518e-01       0.02517791
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd        SSn         SSd       F         p p<.05
## 1   9 190 1158859565 16565540235 1.47685 0.1589511      
## 
## $aov
## Call:
##    aov(formula = formula(aov_formula), data = data)
## 
## Terms:
##                 gender_contestant    Opponent gender_contestant:Opponent
## Sum of Squares          296705239   952116239                  506667493
## Deg. of Freedom                 1           4                          4
##                   Residuals
## Sum of Squares  19616823829
## Deg. of Freedom         190
## 
## Residual standard error: 10161.03
## Estimated effects may be unbalanced

Sample write up:

A two-way ANOVA was conducted to examine the effect of gender and opponent on the amount of the offer chosen. There was no significant interaction effect, F(4,190)= 1.23, p= .301, \(\eta^2_g\)= .025. Neither were there any significant main effects for opponent and gender of the contestant, respectively: F(4,190)= 2.25, p= .065, \(\eta^2_g\)= .045 and F(1,190)= 2.25, p= .166, \(\eta^2_g\)= .01.

Raw data: ‘high’

Now let us look at the highest offers. Similarly no significant effects.

chase$contest_id<-seq(1:200) # need to make an ID.
ez2<-ezANOVA(data=chase, dv=.(High), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)

## Warning: Converting "contest_id" to factor for ANOVA.

## Warning: Converting "gender_contestant" to factor for ANOVA.

## Warning: Converting "Opponent" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

ez2

## $ANOVA
##                       Effect DFn DFd          SSn         SSd            F
## 1                (Intercept)   1 190 162534903354 21428816079 1441.1263564
## 2          gender_contestant   1 190    192179587 21428816079    1.7039729
## 3                   Opponent   4 190    583194189 21428816079    1.2927324
## 4 gender_contestant:Opponent   4 190    226003466 21428816079    0.5009686
##              p p<.05         ges
## 1 1.212988e-90     * 0.883516075
## 2 1.933477e-01       0.008888563
## 3 2.743612e-01       0.026494363
## 4 7.350533e-01       0.010436636
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn        SSd         F         p p<.05
## 1   9 190 248256846 8055723154 0.6505906 0.7526215      
## 
## $aov
## Call:
##    aov(formula = formula(aov_formula), data = data)
## 
## Terms:
##                 gender_contestant    Opponent gender_contestant:Opponent
## Sum of Squares          112631295   584229160                  226003466
## Deg. of Freedom                 1           4                          4
##                   Residuals
## Sum of Squares  21428816079
## Deg. of Freedom         190
## 
## Residual standard error: 10619.95
## Estimated effects may be unbalanced

A two-way ANOVA was conducted to examine the effect of gender and opponent on the highest offer made. There was no significant interaction effect, F(4,190)= 0.501, p= .735, \(\eta^2_g\)= .01. Neither were there any significant main effects for opponent and gender of the contestant, respectively: F(4,190)= 1.29, p= .274, \(\eta^2_g\)= .026 and F(1,190)= 1.70, p= .193, \(\eta^2_g\)= .009.

Log-transformed data: Chosen.

Incidentally, you would draw the same conclusion if you analyzed the log-transformed data.

require(ez)
ez_log<-ezANOVA(data=chase, dv=.(Log_chosen), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)

## Warning: Converting "contest_id" to factor for ANOVA.

## Warning: Converting "gender_contestant" to factor for ANOVA.

## Warning: Converting "Opponent" to factor for ANOVA.

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

ez_log

## $ANOVA
##                       Effect DFn DFd          SSn      SSd            F
## 1                (Intercept)   1 190 2.606907e+03 5.559471 89093.422654
## 2          gender_contestant   1 190 8.806991e-02 5.559471     3.009870
## 3                   Opponent   4 190 2.069874e-01 5.559471     1.768496
## 4 gender_contestant:Opponent   4 190 1.461738e-01 5.559471     1.248906
##               p p<.05        ges
## 1 8.333219e-256     * 0.99787195
## 2  8.438055e-02       0.01559438
## 3  1.368519e-01       0.03589506
## 4  2.917479e-01       0.02561915
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn      SSd        F         p p<.05
## 1   9 190 0.2057234 3.810851 1.139653 0.3367455      
## 
## $aov
## Call:
##    aov(formula = formula(aov_formula), data = data)
## 
## Terms:
##                 gender_contestant Opponent gender_contestant:Opponent Residuals
## Sum of Squares           0.135726 0.213300                   0.146174  5.559471
## Deg. of Freedom                 1        4                          4       190
## 
## Residual standard error: 0.1710566
## Estimated effects may be unbalanced

Post-hoc effects.

It makes little sense to report these given that none of the effects are significant but I have reported them here nonetheless, so that you have an example.

library(multcomp)

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

summary(glht(ez$aov, linfct = mcp(Opponent = "Tukey")))

## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
## default contrast might be inappropriate

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = formula(aov_formula), data = data)
## 
## Linear Hypotheses:
##                    Estimate Std. Error t value Pr(>|t|)
## Jenny - Anne == 0  -4125.00    4637.85  -0.889    0.897
## Mark - Anne == 0   -2835.53    3120.25  -0.909    0.889
## Paul - Anne == 0   -2166.67    2933.24  -0.739    0.945
## Shaun - Anne == 0  -2778.85    2876.27  -0.966    0.865
## Mark - Jenny == 0   1289.47    4758.34   0.271    0.999
## Paul - Jenny == 0   1958.33    4637.85   0.422    0.993
## Shaun - Jenny == 0  1346.15    4602.04   0.293    0.998
## Paul - Mark == 0     668.86    3120.25   0.214    1.000
## Shaun - Mark == 0     56.68    3066.76   0.018    1.000
## Shaun - Paul == 0   -612.18    2876.27  -0.213    1.000
## (Adjusted p values reported -- single-step method)

Violin plot

Raw data.

No outliers really with Jenny but all other chasers have outliers. (You can further improve the graph, by for example sorting the names).

require(ggpubr)

## Loading required package: ggpubr

require(RColorBrewer)

## Loading required package: RColorBrewer

plot_moderation <- ggviolin(chase, x = "Opponent", y = "Chosen",
                color = "Opponent", palette = "Set1",
                add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Amount chosen (GBP)", color="Opponent", shape= "Opponent")
plot_moderation  + facet_wrap(~gender_contestant)

Raw data: high.

Note that this overrides… .

require(ggpubr)
require(RColorBrewer)
plot_moderation <- ggviolin(chase, x = "Opponent", y = "High",
                color = "Opponent", palette = "Set1",
                add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Highest offer (GBP)", color="Opponent", shape= "Opponent")
plot_moderation  + facet_wrap(~gender_contestant)

log-transform data: chosen.

plot_moderation2 <- ggviolin(chase, x = "Opponent", y = "Log_chosen",
                color = "Opponent", palette = "Set1",
                add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Transformed amount chosen (log10(x+10000))", color="Opponent", shape= "Opponent")
plot_moderation2  + facet_wrap(~gender_contestant)

Gender x Age interaction models

There is no evidence for an interaction between gender and age.

Note that in the log. transformed data, we find some suggestions (but note that we have not examined all assumptions from OLS regression - normally you would do so before drawing any firm conclusions.)

interaction<-lm(Chosen ~ gender_contestant*Age, data=chase)
summary(interaction)

## 
## Call:
## lm(formula = Chosen ~ gender_contestant * Age, data = chase)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10678  -4544  -2227   -538  45461 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                3449.68    3317.95   1.040    0.300
## gender_contestantMale       594.09    4595.45   0.129    0.897
## Age                          69.62      70.23   0.991    0.323
## gender_contestantMale:Age    52.49     102.02   0.515    0.607
## 
## Residual standard error: 10270 on 196 degrees of freedom
## Multiple R-squared:  0.03218,    Adjusted R-squared:  0.01737 
## F-statistic: 2.172 on 3 and 196 DF,  p-value: 0.09256

interaction2<-lm(Log_chosen ~ gender_contestant*Age, data=chase)
summary(interaction2)

## 
## Call:
## lm(formula = Log_chosen ~ gender_contestant * Age, data = chase)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44832 -0.08499 -0.01945  0.02396  0.60996 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                4.0636741  0.0550211  73.857   <2e-16 ***
## gender_contestantMale      0.0840798  0.0762056   1.103    0.271    
## Age                        0.0026130  0.0011646   2.244    0.026 *  
## gender_contestantMale:Age -0.0005152  0.0016918  -0.305    0.761    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1704 on 196 degrees of freedom
## Multiple R-squared:  0.06055,    Adjusted R-squared:  0.04617 
## F-statistic: 4.211 on 3 and 196 DF,  p-value: 0.006502

main_model<-lm(Log_chosen ~ gender_contestant+Age, data=chase)
summary(main_model)

## 
## Call:
## lm(formula = Log_chosen ~ gender_contestant + Age, data = chase)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.45147 -0.08292 -0.02040  0.02647  0.60876 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.0746352  0.0415182  98.141  < 2e-16 ***
## gender_contestantMale 0.0620904  0.0242986   2.555  0.01136 *  
## Age                   0.0023689  0.0008428   2.811  0.00544 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.17 on 197 degrees of freedom
## Multiple R-squared:  0.06011,    Adjusted R-squared:  0.05056 
## F-statistic: 6.299 on 2 and 197 DF,  p-value: 0.00223

Plot

You might have noticed that the methods of the slides from Lecture 5 do not work as here you have a factor x numeric interaction! So you need to go back to earlier lectures.

Look back to exercise 2, for example (note some packages should still be loaded.).

The figure shows lots of overlap… . Again, you can further beautify this… .

require(scales)

## Loading required package: scales

chase_plot<- ggplot(data = chase, aes(Age, Chosen, colour= gender_contestant)) + scale_colour_brewer(palette = "Pastel1") + geom_point(size=2.5, shape=16) + theme_hc(12)  + labs(x="Age (Years)", y="Chosen (GBP)", title="Chase") + scale_x_continuous(limits=c(18,80),breaks = pretty_breaks(4)) + geom_smooth(method="lm")
chase_plot

## `geom_smooth()` using formula = 'y ~ x'

BONUS: Continuous x Continuous interaction.

Load the data.

require(car)
salaries<-carData::Salaries
require(dplyr)
salaries<- salaries %>% mutate(monthly_salary=salary/9)
salaries<- salaries %>% mutate(yrs.since.phd_cent= yrs.since.phd -mean(yrs.since.phd), yrs.service_cent= yrs.service - mean(yrs.service))
interaction_salaries<-lm(monthly_salary ~ yrs.since.phd_cent*yrs.service_cent, data=salaries)
summary(interaction_salaries)

## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd_cent * yrs.service_cent, 
##     data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7091  -1921   -282   1462  11889 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         13725.9411   188.7370  72.725  < 2e-16 ***
## yrs.since.phd_cent                    117.3429    26.9972   4.346 1.76e-05 ***
## yrs.service_cent                       27.8365    28.3200   0.983    0.326    
## yrs.since.phd_cent:yrs.service_cent    -7.1797     0.8319  -8.630  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2791 on 393 degrees of freedom
## Multiple R-squared:  0.3177, Adjusted R-squared:  0.3125 
## F-statistic: 60.99 on 3 and 393 DF,  p-value: < 2.2e-16

Note that this is likely not sensible to do?!

Why not? : These are very highly correlated!

Note that you would also further check

cor.test(salaries$yrs.since.phd, salaries$yrs.service)

## 
##  Pearson's product-moment correlation
## 
## data:  salaries$yrs.since.phd and salaries$yrs.service
## t = 43.524, df = 395, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8909977 0.9252353
## sample estimates:
##       cor 
## 0.9096491

Johnson Neyman plot

require(jtools)

## Loading required package: jtools

## 
## Attaching package: 'jtools'

## The following object is masked from 'package:mvtnorm':
## 
##     standardize

require(interactions)

## Loading required package: interactions

johnson_neyman(interaction_salaries, pred = yrs.since.phd_cent, modx = yrs.service_cent)

## JOHNSON-NEYMAN INTERVAL
## 
## When yrs.service_cent is OUTSIDE the interval [8.28, 27.05], the slope of
## yrs.since.phd_cent is p < .05.
## 
## Note: The range of observed values of yrs.service_cent is [-17.61, 42.39]

BONUS: use vegan.

The conclusion is that again there is no evidence for an interaction effect. Note that it might not work if the data are not ‘attached’.

require(vegan)

## Loading required package: vegan

## Loading required package: permute

## Loading required package: lattice

## This is vegan 2.6-8

# margin corresponds to type III error.
require(readxl)
chase<-read_xlsx("thechase_R.xlsx")
require(dplyr)
chase <- chase %>% mutate(Log_chosen=log10(Chosen+10000))
attach(chase)
set.seed(123) # seeds to make it replicable
adonis2(Chosen~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')

set.seed(234)
adonis2(Log_chosen~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')

set.seed(345)
adonis2(High~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')

LmPerm is also an option. (Check the vignette,‘uniquely’ corresponds to type III). There still is no evidence for an interaction.

require(lmPerm)

## Loading required package: lmPerm

## 
## Attaching package: 'lmPerm'

## The following object is masked from 'package:permute':
## 
##     permute

set.seed(123456)
anova(lmp(Chosen~gender_contestant*Opponent,chase))

## [1] "Settings:  unique SS "
## Analysis of Variance Table
## 
## Response: Chosen
##                               Df   R Sum Sq R Mean Sq Iter Pr(Prob)  
## gender_contestant1             1 1.9987e+08 199873750  167   0.3772  
## Opponent1                      4 9.2876e+08 232189562 5000   0.0418 *
## gender_contestant1:Opponent1   4 5.0667e+08 126666873 4766   0.3277  
## Residuals                    190 1.9617e+10 103246441                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

set.seed(789)
anova(lmp(High~gender_contestant*Opponent,chase))

## [1] "Settings:  unique SS "
## Analysis of Variance Table
## 
## Response: High
##                               Df   R Sum Sq R Mean Sq Iter Pr(Prob)
## gender_contestant1             1 1.9218e+08 192179587   51   0.7451
## Opponent1                      4 5.8319e+08 145798547 2944   0.1705
## gender_contestant1:Opponent1   4 2.2600e+08  56500866   51   1.0000
## Residuals                    190 2.1429e+10 112783243

Incidentally you would find effects when conducting permutation tests… .

A median test similarly suggests there are differences based on opponents! (You would then explore these further).

require(lmPerm)
set.seed(01234)
anova(lmp(Chosen~gender_contestant+Opponent,chase))

## [1] "Settings:  unique SS "
## Analysis of Variance Table
## 
## Response: Chosen
##                     Df   R Sum Sq R Mean Sq Iter Pr(Prob)  
## gender_contestant1   1 2.8635e+08 286345985 1177  0.07901 .
## Opponent1            4 9.5212e+08 238029060 2873  0.04316 *
## Residuals          194 2.0123e+10 103729337                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

require(WRS2)

## Loading required package: WRS2

set.seed(1234)
med1way(Chosen~ gender_contestant+Opponent, data=chase, iter=10000)

## Call:
## med1way(formula = Chosen ~ gender_contestant + Opponent, data = chase, 
##     iter = 10000)
## 
## Test statistic F: 5.3079 
## Critical value: 3.855 
## p-value: 0.0226

set.seed(1234)
med1way(Chosen~Opponent, data=chase, iter=10000)

## Call:
## med1way(formula = Chosen ~ Opponent, data = chase, iter = 10000)
## 
## Test statistic F: 2.8567 
## Critical value: 2.269 
## p-value: 0.0223

BONUS use party.

None of the variables were selected :/.

chase_red<-chase%>%dplyr::select(c(Opponent,Age,High,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
require(party)

## Loading required package: party

## Loading required package: grid

## Loading required package: modeltools

## Loading required package: stats4

## 
## Attaching package: 'modeltools'

## The following object is masked from 'package:car':
## 
##     Predict

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## 
## Attaching package: 'party'

## The following object is masked from 'package:dplyr':
## 
##     where

ctree<-ctree(High~., chase_red)
plot(ctree)

Note that this overrides… . What happens when we pick ‘Chosen’. Still nothing… .

chase_red<-chase%>%dplyr::select(c(Opponent,Age,Chosen,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
require(party)
ctree<-ctree(Chosen~., chase_red)
plot(ctree)

Alas, also no effect on success… . So unfortunately not much interesting going on there… .

chase_red<-chase%>%dplyr::select(c(Opponent,Age,success_player,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
chase_red$success_player<-as.factor(chase_red$success_player)
require(party)
ctree<-ctree(success_player~., chase_red)
plot(ctree)

The end

THE END!

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.2
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] party_1.3-17       strucchange_1.5-4  sandwich_3.1-1     zoo_1.8-12        
##  [5] modeltools_0.2-23  WRS2_1.1-6         lmPerm_2.1.0       vegan_2.6-8       
##  [9] lattice_0.22-6     permute_0.9-7      interactions_1.2.0 jtools_2.3.0      
## [13] scales_1.3.0       RColorBrewer_1.1-3 ggpubr_0.6.0       multcomp_1.4-26   
## [17] TH.data_1.1-2      MASS_7.3-61        survival_3.7-0     mvtnorm_1.3-1     
## [21] ez_4.4-0           car_3.1-3          carData_3.0-5      dplyr_1.1.4       
## [25] nortest_1.0-4      ggthemes_5.1.0     ggplot2_3.5.1      skimr_2.1.5       
## [29] readxl_1.4.3      
## 
## loaded via a namespace (and not attached):
##  [1] rlang_1.1.4         magrittr_2.0.3      furrr_0.3.1        
##  [4] matrixStats_1.4.1   compiler_4.4.2      mgcv_1.9-1         
##  [7] vctrs_0.6.5         reshape2_1.4.4      stringr_1.5.1      
## [10] pkgconfig_2.0.3     fastmap_1.2.0       backports_1.5.0    
## [13] labeling_0.4.3      pander_0.6.5        utf8_1.2.4         
## [16] rmarkdown_2.28      nloptr_2.1.1        purrr_1.0.2        
## [19] xfun_0.49           cachem_1.1.0        jsonlite_1.8.9     
## [22] reshape_0.8.9       broom_1.0.7         parallel_4.4.2     
## [25] cluster_2.1.6       R6_2.5.1            coin_1.4-3         
## [28] bslib_0.8.0         stringi_1.8.4       parallelly_1.38.0  
## [31] boot_1.3-31         jquerylib_0.1.4     cellranger_1.1.0   
## [34] Rcpp_1.0.13         knitr_1.49          base64enc_0.1-3    
## [37] Matrix_1.7-1        splines_4.4.2       tidyselect_1.2.1   
## [40] rstudioapi_0.17.1   abind_1.4-8         yaml_2.3.10        
## [43] codetools_0.2-20    listenv_0.9.1       tibble_3.2.1       
## [46] plyr_1.8.9          withr_3.0.2         evaluate_1.0.1     
## [49] future_1.34.0       pillar_1.9.0        generics_0.1.3     
## [52] munsell_0.5.1       minqa_1.2.8         globals_0.16.3     
## [55] glue_1.8.0          tools_4.4.2         lme4_1.1-35.5      
## [58] ggsignif_0.6.4      forcats_1.0.0       tidyr_1.3.1        
## [61] libcoin_1.0-10      colorspace_2.1-1    nlme_3.1-166       
## [64] repr_1.1.7          Formula_1.2-5       cli_3.6.3          
## [67] fansi_1.0.6         gtable_0.3.6        broom.mixed_0.2.9.6
## [70] rstatix_0.7.2       sass_0.4.9          digest_0.6.37      
## [73] farver_2.1.2        htmltools_0.5.8.1   lifecycle_1.0.4

Solution Lecture 5

Prof. Thomas Pollet, Northumbria University, UK (thomas.pollet@northumbria.ac.uk)

2025-02-17 | disclaimer