Download the data thechase_R.xlsx . This file contains coding for 50 episodes from ITV’s The Chase (coded by yours Truly). Read the data (use the ‘readxl’ package for example). The codes are largely self-explanatory.
Test the assumptions for a 2 (gender_contestant) x 5 (Opponent) ANOVA on ‘High’ (this is the highest offer made by the chaser).
Conduct the 2 x 5 ANOVA. Calculate the post-hoc contrasts. Report as you would do in a paper.
Make a (beautiful) violin plot.
Conduct a 2 (gender_contestant) x 2 (age) ANOVA on ‘chosen’ (the amount chosen by the contestant.).
Make a plot showing the interaction (see the examples from the slides or check ggplot2).
BONUS: Load the salaries dataset, test the interaction effect for
yrs.service x yrs.since.Ph.D. on monthly salary.
Calculate Johnson-Neyman interval range.
BONUS: use vegan to conduct a permutation analysis. for the 2 x 5 ANOVA you conducted above.
BONUS: use the party package to model ‘high’. (Reduce the dataset via ‘dplyr’, keep a sensible number of variables).
Load the data and get some descriptives.
setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture5")
require(readxl)
## Loading required package: readxl
chase<-read_xlsx("thechase_R.xlsx")
require(skimr)
## Loading required package: skimr
skim(chase)
Name | chase |
Number of rows | 200 |
Number of columns | 15 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Date | 0 | 1 | 10 | 10 | 0 | 30 | 0 |
Time | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
Opponent | 0 | 1 | 4 | 5 | 0 | 5 | 0 |
gender_contestant | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Chosen_2 | 0 | 1 | 3 | 6 | 0 | 3 | 0 |
success_player | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Outcome_episode | 0 | 1 | 3 | 4 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
ID | 0 | 1.00 | 25.50 | 14.47 | 1 | 13.00 | 25.5 | 38.00 | 50 | ▇▇▇▇▇ |
Number | 0 | 1.00 | 2.50 | 1.12 | 1 | 1.75 | 2.5 | 3.25 | 4 | ▇▇▁▇▇ |
Age | 0 | 1.00 | 42.77 | 14.45 | 18 | 31.00 | 42.0 | 54.00 | 79 | ▇▇▇▅▂ |
Earned | 0 | 1.00 | 5150.00 | 1919.88 | 0 | 4000.00 | 5000.0 | 6000.00 | 10000 | ▂▆▇▃▁ |
Low | 1 | 1.00 | 290.39 | 1909.96 | -10000 | 140.00 | 1000.0 | 1000.00 | 3000 | ▁▁▁▂▇ |
High | 0 | 1.00 | 33040.00 | 10598.11 | 10000 | 25000.00 | 31000.0 | 40000.00 | 65000 | ▃▇▆▃▁ |
Chosen | 0 | 1.00 | 7806.00 | 10363.33 | -5000 | 4000.00 | 5000.0 | 7000.00 | 55000 | ▇▁▁▁▁ |
steps_left | 77 | 0.62 | 1.36 | 1.17 | 0 | 0.00 | 1.0 | 2.00 | 4 | ▇▆▆▅▁ |
I have also looked at ‘Chosen’ rather than ‘High’.
Here is a graph facetted by opponent to explore normality.
require(ggplot2)
## Loading required package: ggplot2
require(ggthemes)
## Loading required package: ggthemes
plot_hist <- ggplot(chase, aes(x=Chosen))
plot_hist <- plot_hist+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~Opponent) + theme_gdocs(12) + xlab("Amount Chosen") + ylab("Count")
plot_hist
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Not great in terms of distribution!
How would you resolve it? (see below for the answer)
Incidentally the above was for chosen rather than high.
require(ggplot2)
require(ggthemes)
plot_hist <- ggplot(chase, aes(x=High))
plot_hist <- plot_hist+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~Opponent) + theme_gdocs(12) + xlab("Highest offer") + ylab("Count")
plot_hist
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here is one facetted by gender.
require(ggplot2)
require(ggthemes)
plot_hist2 <- ggplot(chase, aes(x=Chosen))
plot_hist2 <- plot_hist2+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~gender_contestant) + theme_gdocs(12) + xlab("Amount Chosen") + ylab("Count")
plot_hist2
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This similarly shows some issues. Long tail… .
Let us look at Highest offers. (note that it overrides previous codes.)
require(ggplot2)
require(ggthemes)
plot_hist2 <- ggplot(chase, aes(x=High))
plot_hist2 <- plot_hist2+ geom_histogram(colour = "black", fill = "white") + facet_wrap(~gender_contestant) + theme_gdocs(12) + xlab("Highest offer") + ylab("Count")
plot_hist2
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Unsurprisingly these tests suggest violations for ‘Chosen’.
# other routes.
require(nortest)
## Loading required package: nortest
# Anderson-Darling test.
ad.test(chase$Chosen)
##
## Anderson-Darling normality test
##
## data: chase$Chosen
## A = 33.746, p-value < 2.2e-16
# Cramer-von Mises test.
cvm.test(chase$Chosen)
## Warning in cvm.test(chase$Chosen): p-value is smaller than 7.37e-10, cannot be
## computed more accurately
##
## Cramer-von Mises normality test
##
## data: chase$Chosen
## W = 6.6963, p-value = 7.37e-10
# transform, we need to add a constant to make every value positive.
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
chase <- chase %>% mutate(Log_chosen=log10(Chosen+10000))
ad.test(chase$Log_chosen)
##
## Anderson-Darling normality test
##
## data: chase$Log_chosen
## A = 17.74, p-value < 2.2e-16
cvm.test(chase$Log_chosen)
## Warning in cvm.test(chase$Log_chosen): p-value is smaller than 7.37e-10, cannot
## be computed more accurately
##
## Cramer-von Mises normality test
##
## data: chase$Log_chosen
## W = 3.3068, p-value = 7.37e-10
This reduces non-normality (values went down) but still strong evidence for violations.
Let us look at some plots, to see what has happened.
require(ggplot2)
qqplot <- ggplot(chase, aes(sample = Chosen))
qqplot + stat_qq()
# log
qqplot_log <- ggplot(chase, aes(sample = Log_chosen))
qqplot_log + stat_qq()
The log-transformed data looks better but clearly one outlier. In conclusion, we have clear violations but we will tackle these issues in the ‘bonus’ section.
What happens to the highest offer.
# other routes but now for highest offer .
require(nortest)
# Anderson-Darling test.
ad.test(chase$High)
##
## Anderson-Darling normality test
##
## data: chase$High
## A = 1.4766, p-value = 0.0008027
# Cramer-von Mises test.
cvm.test(chase$High)
##
## Cramer-von Mises normality test
##
## data: chase$High
## W = 0.24083, p-value = 0.001648
Also some issues there… . Transforming it improves it somewhat but still non-normal according to the tests, but graphically it looks a lot better than ‘Chosen’.
require(dplyr)
chase <- chase %>% mutate(Log_high=log10(High))
ad.test(chase$Log_high)
##
## Anderson-Darling normality test
##
## data: chase$Log_high
## A = 1.3068, p-value = 0.002106
cvm.test(chase$Log_high)
##
## Cramer-von Mises normality test
##
## data: chase$Log_high
## W = 0.20094, p-value = 0.005163
The assumption of homogeneity of variances is not violated for neither opponent nor gender of the contestant, respectively: F(4,195) = 2.23, p= .067 and F(1,198) = 0.95, p= .331.
Note that there is a trend with ‘Opponent’ (p= .067) but this is not really too worrisome.
require(car)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
leveneTest(Chosen ~ gender_contestant, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.9507 0.3307
## 198
leveneTest(Chosen ~ Opponent, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 2.2297 0.06724 .
## 195
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Incidentally, if you use the transformed data, then there is similarly no issue.
leveneTest(Log_chosen ~ gender_contestant, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.1146 0.7354
## 198
leveneTest(Log_chosen ~ Opponent, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 1.8052 0.1294
## 195
As an aside, note that in the current analyses we do not account for the fact that observations do not satisfy the assumption of independence. As you might have noticed, the data are nested within episodes! (The offers (and chosen offers) will depend based on previous players’ performance!). We will further discuss this when we cover multilevel models.
Similarly no issues with the high offers made, in terms of homogeneity of variance.
require(car)
leveneTest(High~ gender_contestant, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.7813 0.1835
## 198
leveneTest(High ~ Opponent, data=chase)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.2144 0.9302
## 195
require(ez)
## Loading required package: ez
chase$contest_id<-seq(1:200) # need to make an ID.
ez<-ezANOVA(data=chase, dv=.(Chosen), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)
## Warning: Converting "contest_id" to factor for ANOVA.
## Warning: Converting "gender_contestant" to factor for ANOVA.
## Warning: Converting "Opponent" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
ez
## $ANOVA
## Effect DFn DFd SSn SSd F
## 1 (Intercept) 1 190 8102271319 19616823829 78.475066
## 2 gender_contestant 1 190 199873750 19616823829 1.935890
## 3 Opponent 4 190 928758248 19616823829 2.248887
## 4 gender_contestant:Opponent 4 190 506667493 19616823829 1.226840
## p p<.05 ges
## 1 5.747493e-16 * 0.29229927
## 2 1.657424e-01 0.01008613
## 3 6.535730e-02 0.04520477
## 4 3.008518e-01 0.02517791
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 9 190 1158859565 16565540235 1.47685 0.1589511
##
## $aov
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Terms:
## gender_contestant Opponent gender_contestant:Opponent
## Sum of Squares 296705239 952116239 506667493
## Deg. of Freedom 1 4 4
## Residuals
## Sum of Squares 19616823829
## Deg. of Freedom 190
##
## Residual standard error: 10161.03
## Estimated effects may be unbalanced
Sample write up:
A two-way ANOVA was conducted to examine the effect of gender and opponent on the amount of the offer chosen. There was no significant interaction effect, F(4,190)= 1.23, p= .301, \(\eta^2_g\)= .025. Neither were there any significant main effects for opponent and gender of the contestant, respectively: F(4,190)= 2.25, p= .065, \(\eta^2_g\)= .045 and F(1,190)= 2.25, p= .166, \(\eta^2_g\)= .01.
Now let us look at the highest offers. Similarly no significant effects.
chase$contest_id<-seq(1:200) # need to make an ID.
ez2<-ezANOVA(data=chase, dv=.(High), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)
## Warning: Converting "contest_id" to factor for ANOVA.
## Warning: Converting "gender_contestant" to factor for ANOVA.
## Warning: Converting "Opponent" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
ez2
## $ANOVA
## Effect DFn DFd SSn SSd F
## 1 (Intercept) 1 190 162534903354 21428816079 1441.1263564
## 2 gender_contestant 1 190 192179587 21428816079 1.7039729
## 3 Opponent 4 190 583194189 21428816079 1.2927324
## 4 gender_contestant:Opponent 4 190 226003466 21428816079 0.5009686
## p p<.05 ges
## 1 1.212988e-90 * 0.883516075
## 2 1.933477e-01 0.008888563
## 3 2.743612e-01 0.026494363
## 4 7.350533e-01 0.010436636
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 9 190 248256846 8055723154 0.6505906 0.7526215
##
## $aov
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Terms:
## gender_contestant Opponent gender_contestant:Opponent
## Sum of Squares 112631295 584229160 226003466
## Deg. of Freedom 1 4 4
## Residuals
## Sum of Squares 21428816079
## Deg. of Freedom 190
##
## Residual standard error: 10619.95
## Estimated effects may be unbalanced
A two-way ANOVA was conducted to examine the effect of gender and opponent on the highest offer made. There was no significant interaction effect, F(4,190)= 0.501, p= .735, \(\eta^2_g\)= .01. Neither were there any significant main effects for opponent and gender of the contestant, respectively: F(4,190)= 1.29, p= .274, \(\eta^2_g\)= .026 and F(1,190)= 1.70, p= .193, \(\eta^2_g\)= .009.
Incidentally, you would draw the same conclusion if you analyzed the log-transformed data.
require(ez)
ez_log<-ezANOVA(data=chase, dv=.(Log_chosen), wid=.(contest_id), between = .(gender_contestant,Opponent), return_aov=T, detailed=TRUE, type=3)
## Warning: Converting "contest_id" to factor for ANOVA.
## Warning: Converting "gender_contestant" to factor for ANOVA.
## Warning: Converting "Opponent" to factor for ANOVA.
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
ez_log
## $ANOVA
## Effect DFn DFd SSn SSd F
## 1 (Intercept) 1 190 2.606907e+03 5.559471 89093.422654
## 2 gender_contestant 1 190 8.806991e-02 5.559471 3.009870
## 3 Opponent 4 190 2.069874e-01 5.559471 1.768496
## 4 gender_contestant:Opponent 4 190 1.461738e-01 5.559471 1.248906
## p p<.05 ges
## 1 8.333219e-256 * 0.99787195
## 2 8.438055e-02 0.01559438
## 3 1.368519e-01 0.03589506
## 4 2.917479e-01 0.02561915
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 9 190 0.2057234 3.810851 1.139653 0.3367455
##
## $aov
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Terms:
## gender_contestant Opponent gender_contestant:Opponent Residuals
## Sum of Squares 0.135726 0.213300 0.146174 5.559471
## Deg. of Freedom 1 4 4 190
##
## Residual standard error: 0.1710566
## Estimated effects may be unbalanced
It makes little sense to report these given that none of the effects are significant but I have reported them here nonetheless, so that you have an example.
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
summary(glht(ez$aov, linfct = mcp(Opponent = "Tukey")))
## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found --
## default contrast might be inappropriate
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = formula(aov_formula), data = data)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Jenny - Anne == 0 -4125.00 4637.85 -0.889 0.897
## Mark - Anne == 0 -2835.53 3120.25 -0.909 0.889
## Paul - Anne == 0 -2166.67 2933.24 -0.739 0.945
## Shaun - Anne == 0 -2778.85 2876.27 -0.966 0.866
## Mark - Jenny == 0 1289.47 4758.34 0.271 0.999
## Paul - Jenny == 0 1958.33 4637.85 0.422 0.993
## Shaun - Jenny == 0 1346.15 4602.04 0.293 0.998
## Paul - Mark == 0 668.86 3120.25 0.214 1.000
## Shaun - Mark == 0 56.68 3066.76 0.018 1.000
## Shaun - Paul == 0 -612.18 2876.27 -0.213 1.000
## (Adjusted p values reported -- single-step method)
No outliers really with Jenny but all other chasers have outliers. (You can further improve the graph, by for example sorting the names).
require(ggpubr)
## Loading required package: ggpubr
require(RColorBrewer)
## Loading required package: RColorBrewer
plot_moderation <- ggviolin(chase, x = "Opponent", y = "Chosen",
color = "Opponent", palette = "Set1",
add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Amount chosen (GBP)", color="Opponent", shape= "Opponent")
plot_moderation + facet_wrap(~gender_contestant)
Note that this overrides… .
require(ggpubr)
require(RColorBrewer)
plot_moderation <- ggviolin(chase, x = "Opponent", y = "High",
color = "Opponent", palette = "Set1",
add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Highest offer (GBP)", color="Opponent", shape= "Opponent")
plot_moderation + facet_wrap(~gender_contestant)
plot_moderation2 <- ggviolin(chase, x = "Opponent", y = "Log_chosen",
color = "Opponent", palette = "Set1",
add = c("jitter","boxplot"), shape = "Opponent") + labs(x="Opponent", y= "Transformed amount chosen (log10(x+10000))", color="Opponent", shape= "Opponent")
plot_moderation2 + facet_wrap(~gender_contestant)
There is no evidence for an interaction between gender and age.
Note that in the log. transformed data, we find some suggestions (but note that we have not examined all assumptions from OLS regression - normally you would do so before drawing any firm conclusions.)
interaction<-lm(Chosen ~ gender_contestant*Age, data=chase)
summary(interaction)
##
## Call:
## lm(formula = Chosen ~ gender_contestant * Age, data = chase)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10678 -4544 -2227 -538 45461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3449.68 3317.95 1.040 0.300
## gender_contestantMale 594.09 4595.45 0.129 0.897
## Age 69.62 70.23 0.991 0.323
## gender_contestantMale:Age 52.49 102.02 0.515 0.607
##
## Residual standard error: 10270 on 196 degrees of freedom
## Multiple R-squared: 0.03218, Adjusted R-squared: 0.01737
## F-statistic: 2.172 on 3 and 196 DF, p-value: 0.09256
interaction2<-lm(Log_chosen ~ gender_contestant*Age, data=chase)
summary(interaction2)
##
## Call:
## lm(formula = Log_chosen ~ gender_contestant * Age, data = chase)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44832 -0.08499 -0.01945 0.02396 0.60996
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0636741 0.0550211 73.857 <2e-16 ***
## gender_contestantMale 0.0840798 0.0762056 1.103 0.271
## Age 0.0026130 0.0011646 2.244 0.026 *
## gender_contestantMale:Age -0.0005152 0.0016918 -0.305 0.761
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1704 on 196 degrees of freedom
## Multiple R-squared: 0.06055, Adjusted R-squared: 0.04617
## F-statistic: 4.211 on 3 and 196 DF, p-value: 0.006502
main_model<-lm(Log_chosen ~ gender_contestant+Age, data=chase)
summary(main_model)
##
## Call:
## lm(formula = Log_chosen ~ gender_contestant + Age, data = chase)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45147 -0.08292 -0.02040 0.02647 0.60876
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0746352 0.0415182 98.141 < 2e-16 ***
## gender_contestantMale 0.0620904 0.0242986 2.555 0.01136 *
## Age 0.0023689 0.0008428 2.811 0.00544 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.17 on 197 degrees of freedom
## Multiple R-squared: 0.06011, Adjusted R-squared: 0.05056
## F-statistic: 6.299 on 2 and 197 DF, p-value: 0.00223
You might have noticed that the methods of the slides from Lecture 5 do not work as here you have a factor x numeric interaction! So you need to go back to earlier lectures.
Look back to exercise 2, for example (note some packages should still be loaded.).
The figure shows lots of overlap… . Again, you can further beautify this… .
require(scales)
## Loading required package: scales
chase_plot<- ggplot(data = chase, aes(Age, Chosen, colour= gender_contestant)) + scale_colour_brewer(palette = "Pastel1") + geom_point(size=2.5, shape=16) + theme_hc(12) + labs(x="Age (Years)", y="Chosen (GBP)", title="Chase") + scale_x_continuous(limits=c(18,80),breaks = pretty_breaks(4)) + geom_smooth(method="lm")
chase_plot
## `geom_smooth()` using formula = 'y ~ x'
Load the data.
require(car)
salaries<-carData::Salaries
require(dplyr)
salaries<- salaries %>% mutate(monthly_salary=salary/9)
salaries<- salaries %>% mutate(yrs.since.phd_cent= yrs.since.phd -mean(yrs.since.phd), yrs.service_cent= yrs.service - mean(yrs.service))
interaction_salaries<-lm(monthly_salary ~ yrs.since.phd_cent*yrs.service_cent, data=salaries)
summary(interaction_salaries)
##
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd_cent * yrs.service_cent,
## data = salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7091 -1921 -282 1462 11889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13725.9411 188.7370 72.725 < 2e-16 ***
## yrs.since.phd_cent 117.3429 26.9972 4.346 1.76e-05 ***
## yrs.service_cent 27.8365 28.3200 0.983 0.326
## yrs.since.phd_cent:yrs.service_cent -7.1797 0.8319 -8.630 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2791 on 393 degrees of freedom
## Multiple R-squared: 0.3177, Adjusted R-squared: 0.3125
## F-statistic: 60.99 on 3 and 393 DF, p-value: < 2.2e-16
Note that this is likely not sensible to do?!
Why not? : These are very highly correlated!
Note that you would also further check
cor.test(salaries$yrs.since.phd, salaries$yrs.service)
##
## Pearson's product-moment correlation
##
## data: salaries$yrs.since.phd and salaries$yrs.service
## t = 43.524, df = 395, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8909977 0.9252353
## sample estimates:
## cor
## 0.9096491
require(jtools)
## Loading required package: jtools
##
## Attaching package: 'jtools'
## The following object is masked from 'package:mvtnorm':
##
## standardize
require(interactions)
## Loading required package: interactions
johnson_neyman(interaction_salaries, pred = yrs.since.phd_cent, modx = yrs.service_cent)
## JOHNSON-NEYMAN INTERVAL
##
## When yrs.service_cent is OUTSIDE the interval [8.28, 27.05], the slope of
## yrs.since.phd_cent is p < .05.
##
## Note: The range of observed values of yrs.service_cent is [-17.61, 42.39]
The conclusion is that again there is no evidence for an interaction effect. Note that it might not work if the data are not ‘attached’.
require(vegan)
## Loading required package: vegan
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
# margin corresponds to type III error.
require(readxl)
chase<-read_xlsx("thechase_R.xlsx")
require(dplyr)
chase <- chase %>% mutate(Log_chosen=log10(Chosen+10000))
attach(chase)
set.seed(123) # seeds to make it replicable
adonis2(Chosen~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')
## Permutation test for adonis under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = Chosen ~ gender_contestant * Opponent, data = chase, method = "euclidean", by = "margin")
## Df SumOfSqs R2 F Pr(>F)
## gender_contestant:Opponent 4 5.0667e+08 0.02371 1.2268 0.308
## Residual 190 1.9617e+10 0.91786
## Total 199 2.1372e+10 1.00000
set.seed(234)
adonis2(Log_chosen~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')
## Permutation test for adonis under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = Log_chosen ~ gender_contestant * Opponent, data = chase, method = "euclidean", by = "margin")
## Df SumOfSqs R2 F Pr(>F)
## gender_contestant:Opponent 4 0.1462 0.02414 1.2489 0.295
## Residual 190 5.5595 0.91821
## Total 199 6.0547 1.00000
set.seed(345)
adonis2(High~gender_contestant*Opponent, data = chase, by='margin', method='euclidean')
## Permutation test for adonis under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = High ~ gender_contestant * Opponent, data = chase, method = "euclidean", by = "margin")
## Df SumOfSqs R2 F Pr(>F)
## gender_contestant:Opponent 4 2.2600e+08 0.01011 0.501 0.717
## Residual 190 2.1429e+10 0.95871
## Total 199 2.2352e+10 1.00000
LmPerm is also an option. (Check the vignette,‘uniquely’ corresponds to type III). There still is no evidence for an interaction.
require(lmPerm)
## Loading required package: lmPerm
##
## Attaching package: 'lmPerm'
## The following object is masked from 'package:permute':
##
## permute
set.seed(123456)
anova(lmp(Chosen~gender_contestant*Opponent,chase))
## [1] "Settings: unique SS "
## Analysis of Variance Table
##
## Response: Chosen
## Df R Sum Sq R Mean Sq Iter Pr(Prob)
## gender_contestant1 1 1.9987e+08 199873750 167 0.3772
## Opponent1 4 9.2876e+08 232189562 5000 0.0418 *
## gender_contestant1:Opponent1 4 5.0667e+08 126666873 4766 0.3277
## Residuals 190 1.9617e+10 103246441
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
set.seed(789)
anova(lmp(High~gender_contestant*Opponent,chase))
## [1] "Settings: unique SS "
## Analysis of Variance Table
##
## Response: High
## Df R Sum Sq R Mean Sq Iter Pr(Prob)
## gender_contestant1 1 1.9218e+08 192179587 51 0.7451
## Opponent1 4 5.8319e+08 145798547 2944 0.1705
## gender_contestant1:Opponent1 4 2.2600e+08 56500866 51 1.0000
## Residuals 190 2.1429e+10 112783243
Incidentally you would find effects when conducting permutation tests… .
A median test similarly suggests there are differences based on opponents! (You would then explore these further).
require(lmPerm)
set.seed(01234)
anova(lmp(Chosen~gender_contestant+Opponent,chase))
## [1] "Settings: unique SS "
## Analysis of Variance Table
##
## Response: Chosen
## Df R Sum Sq R Mean Sq Iter Pr(Prob)
## gender_contestant1 1 2.8635e+08 286345985 1177 0.07901 .
## Opponent1 4 9.5212e+08 238029060 2873 0.04316 *
## Residuals 194 2.0123e+10 103729337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
require(WRS2)
## Loading required package: WRS2
set.seed(1234)
med1way(Chosen~ gender_contestant+Opponent, data=chase, iter=10000)
## Call:
## med1way(formula = Chosen ~ gender_contestant + Opponent, data = chase,
## iter = 10000)
##
## Test statistic F: 5.3079
## Critical value: 3.855
## p-value: 0.0226
set.seed(1234)
med1way(Chosen~Opponent, data=chase, iter=10000)
## Call:
## med1way(formula = Chosen ~ Opponent, data = chase, iter = 10000)
##
## Test statistic F: 2.8567
## Critical value: 2.269
## p-value: 0.0223
None of the variables were selected :/.
chase_red<-chase%>%dplyr::select(c(Opponent,Age,High,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
require(party)
## Loading required package: party
## Loading required package: grid
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'modeltools'
## The following object is masked from 'package:car':
##
## Predict
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
##
## Attaching package: 'party'
## The following object is masked from 'package:dplyr':
##
## where
ctree<-ctree(High~., chase_red)
plot(ctree)
Note that this overrides… . What happens when we pick ‘Chosen’. Still nothing… .
chase_red<-chase%>%dplyr::select(c(Opponent,Age,Chosen,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
require(party)
ctree<-ctree(Chosen~., chase_red)
plot(ctree)
Alas, also no effect on success… . So unfortunately not much interesting going on there… .
chase_red<-chase%>%dplyr::select(c(Opponent,Age,success_player,gender_contestant))
chase_red$gender_contestant<-as.factor(chase_red$gender_contestant)
chase_red$Opponent<-as.factor(chase_red$Opponent)
chase_red$success_player<-as.factor(chase_red$success_player)
require(party)
ctree<-ctree(success_player~., chase_red)
plot(ctree)
THE END!