2018-10-18 | disclaimer

PY0782: Advanced Quantitative research methods.

Last week: All about visualization.

Today: ANOVA

Goals (today)

  • ANOVA and its variants
  • Non-parametric alternatives.

Assignment

After today you should be able to complete the following sections:

  • ANOVA / ANCOVA / MANOVA
  • Non-parametric alternatives to these.

ANOVA

Raise your hand if you have never conducted an ANOVA. Analysis of Variance.

Back to basics: Variance.

A note on notation: \[\sigma^2, s^2, var(X)\]

A measure of variability

Can you name other measures?

We can calculate it for a sample but usually we want to infer this for a population.

Remember \[SD=\sqrt{variance}\]

Why would you need an ANOVA?

We want to compare differences between two or more means.

Why not simply run a bunch of t-tests?

Type I error! (a lot of type I errors)

How does it work?

Imagine a study with three groups –> three means: \(\bar x_1, \bar x_2, \bar x_3\)

We want to find out if those significantly differ.

Grand mean: \(\bar x_g\)

Formulae (for your reference)

Within- and Between- Sum of Squares (SS)

Mean square within (with 3 samples): \[M_{WithinSS}=\dfrac{\sigma_1^2+\sigma_2^2 + \sigma_3^2}3\]

Mean square between:\[M_{BetweenSS}= \dfrac{\sum_{i=1}^n (\bar x_i - \bar x_g)^2}{N-1}\] with N-1.

Basic gist: if between > within evidence for an effect!

F-test

\[F= \dfrac{M_{BetweenSS}}{M_{WithinSS}}\]

look up corresponding degrees of freedom.

Compare significance.

Assumptions.

Dependent variable is interval.

Independent observation (more about this when we discuss multilevel models).

Normally distributed for each category of the Independent variable.

Homogeneity of variance

Time for a new dataset?

Load data.

Three groups (Asians (international), Asian Americans, European Americans), NEO PI-R. Read more here

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture3")
require(haven)
data<-read_spss("Iwasaki_Personality_Data.sav")

Skim

require(skimr)
skim(data)
## Skim summary statistics
##  n obs: 205 
##  n variables: 7 
## 
## ── Variable type:character ────────────────────────────────────────────────────────
##  variable missing complete   n min max empty n_unique
##       GRP       2      203 205   1   1     0        3
## 
## ── Variable type:numeric ──────────────────────────────────────────────────────────
##  variable missing complete   n   mean     sd p0   p25 p50   p75 p100
##         A       0      205 205 113.75  20.61  0 103   115 127    192
##         C       0      205 205 111.62  20.81  0  98   112 124    192
##         E       0      205 205 116.53  22.09  0 103   117 130    192
##        ID       2      203 205 369.98 256.78  1  51.5 527 577.5  628
##         N       0      205 205  95.91  23.17  0  81    97 110    192
##         O       0      205 205 116.22  20.57  0 104   115 127    192
##      hist
##  ▁▁▁▂▇▆▁▁
##  ▁▁▁▃▇▆▁▁
##  ▁▁▁▂▇▅▂▁
##  ▇▁▁▁▁▁▅▇
##  ▁▁▃▇▇▂▁▁
##  ▁▁▁▂▇▆▁▁

Reduced set.

2 missings. Remove this, find out the codings.

require(dplyr)
# missings are Not a number.
data_no_miss<-filter(data, GRP!='NaN')
print_labels(data_no_miss$GRP)
## 
## Labels:
##  value                label
##      0   European Americans
##      1 Asian Internationals
##      2      Asian Americans

Openness to Experience.

Test whether groups differ in Openness to Experience (O) based on their culture.

Assumption Checks.

Non-independent observations = check!

Interval dependent = check!

Normality –> Plot

Homogeneity of variance.

Plot

Look back at week 2 to make this a more beautiful plot!

Mostly people would look at the overall plot, but ideally one would check plots for each group.

require(ggplot2)
plot_hist <- ggplot(data_no_miss, aes(x=O)) 
plot_hist <- plot_hist+ geom_histogram(colour = "black", fill = "white")
plot_hist 

Plot facets.

plot_hist_facet <- ggplot(data_no_miss, aes(x=O)) 
plot_hist_facet <- plot_hist_facet+geom_histogram(colour = "black", fill = "white") 
plot_hist_facet + facet_wrap(~GRP)

Normality

What do you think?

Fairly Robust.

Remember central limit theorem!

Other approaches: Shapiro-Wilk

Not significant –> retain null hypothesis, not significantly different from normal distribution.

data_eur<-filter(data_no_miss, GRP=='0')
data_asian_i<-filter(data_no_miss, GRP=='1')
data_asian_a<-filter(data_no_miss, GRP=='2')
# You would do this for all 3!
shapiro.test(data_asian_a$O)
## 
##  Shapiro-Wilk normality test
## 
## data:  data_asian_a$O
## W = 0.96901, p-value = 0.1584

Kolmogorov-Smirnov.

Beware K-S test: easily significant!

ks.test(data_asian_a$O, "pnorm") #pnorm --> normal distribution
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  data_asian_a$O
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided

Visual checks.

Recommendation check normality visually! (histogram / violin plot / …)

Think back to the 'Datasaurus'.

Variety of other options.

??nortest package. Read the vignette and references.

Anderson-Darling test.

Cramer-von Mises test.

Lilliefors test (K-S) (correction of K-S).

Shapiro-Francia test.

Jarque Bera test (+ adjusted).

Homogeneity of variance.

Bartlett's test - assumes normality.

If significant could also point to deviation in normality as opposed to violation of the assumption of homogeneity of variance.

bartlett.test(O ~ GRP, data=data_no_miss)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  O by GRP
## Bartlett's K-squared = 3.1821, df = 2, p-value = 0.2037

Levene's test.

Levene's test does not assume normality.

require(car) # package which does test
require(dplyr)
# need to make it a factor!
data_no_miss$GRP_fact<-as_factor(data_no_miss$GRP)
leveneTest(O ~ GRP_fact, data=data_no_miss)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2  1.7076 0.1839
##       200

Outcome of assumption checks.

Assumptions upheld here!

Sample write up: Visual inspection suggested that the distribution of the dependent variable is close to normal. A Levene's test suggests that the assumption of homogeneity of variances is not violated, F(2,200) = 1.71, p= .18)

We can move on to ANOVA!

Kurtosis/Skewness.

As an aside: you could mention, the platykurtosis in the Asian American group.

(In your assignment I would count both correct in this case).

Try it yourself.

Test the assumptions for an ANOVA with Extraversion (E) and group as independent variable.

ANOVA.

aov()

Analysis of variance.

Anova1<-aov(O ~ GRP_fact, data=data_no_miss)
summary(Anova1)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## GRP_fact      2   8774    4387   15.05 8.13e-07 ***
## Residuals   200  58284     291                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm(): Linear model.

Anova2<-lm(O~ GRP_fact, data=data_no_miss)
summary(Anova2)
## 
## Call:
## lm(formula = O ~ GRP_fact, data = data_no_miss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.400 -11.982  -0.982  11.828  41.600 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   113.400      1.971  57.529  < 2e-16 ***
## GRP_factAsian Internationals   -2.039      2.817  -0.724     0.47    
## GRP_factAsian Americans        13.582      3.015   4.505 1.13e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.07 on 200 degrees of freedom
## Multiple R-squared:  0.1308, Adjusted R-squared:  0.1222 
## F-statistic: 15.05 on 2 and 200 DF,  p-value: 8.126e-07

F-test.

summary prints parameter tests but should you be after the F-test.

anova(Anova2)
## Analysis of Variance Table
## 
## Response: O
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## GRP_fact    2   8774  4387.0  15.054 8.126e-07 ***
## Residuals 200  58284   291.4                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Drop1.

For your reference. drop1 can also get you some relevant key info.

Don't worry about AIC for now. It is a model fit statistic (lower = better). More about this when we discuss multilevel models.

drop1(Anova2) # some additional info. ??drop1
## Single term deletions
## 
## Model:
## O ~ GRP_fact
##          Df Sum of Sq   RSS    AIC
## <none>                58284 1155.0
## GRP_fact  2      8774 67058 1179.4

ez Package

Best bet for replicating SPSS results!

require(ez)
Ez_ANOVA1<-ezANOVA(data_no_miss, dv=O, wid=ID, between=GRP_fact, detailed=TRUE)
Ez_ANOVA1
## $ANOVA
##     Effect DFn DFd      SSn      SSd        F            p p<.05       ges
## 1 GRP_fact   2 200 8773.973 58283.59 15.05393 8.125554e-07     * 0.1308424
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd        F       p p<.05
## 1   2 200 356.9336 20903.13 1.707561 0.18394

ges?

Effect size measure!

ges= Generalized Eta-Squared. \(\eta^2_g\) . There is also partial \(\eta^2\). Read more here.

How to do Greek symbols and mathematical functions? here or check the .rmd for this lecture.

Omega squared and partial omega squared.

If you are up for a challenge. You can figure out how to calculate this effect size measure on your own!.

write up.

A one-way ANOVA showed a significant effect of cultural group on openness to experience F(2, 200) = 15.05, p < .0001, \(\eta^2_g\) = .13.

Store results

require(apaTables)
apa.aov.table(Anova1, filename="APA_Anova_Table.doc", table.number = 1)
## 
## 
## Table 1 
## 
## ANOVA results using O as the dependent variable
##  
## 
##    Predictor        SS  df        MS       F    p partial_eta2
##  (Intercept) 964467.00   1 964467.00 3309.57 .000             
##     GRP_fact   8773.97   2   4386.98   15.05 .000          .13
##        Error  58283.59 200    291.42                          
##  CI_90_partial_eta2
##                    
##          [.06, .20]
##                    
## 
## Note: Values in square brackets indicate the bounds of the 90% confidence interval for partial eta-squared

Post-hoc tests.

Where does the difference lie?

Post-hoc tests.

resultTukey<-TukeyHSD(Anova1)
resultTukey
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = O ~ GRP_fact, data = data_no_miss)
## 
## $GRP_fact
##                                              diff       lwr       upr
## Asian Internationals-European Americans -2.038889 -8.689634  4.611856
## Asian Americans-European Americans      13.582143  6.463135 20.701151
## Asian Americans-Asian Internationals    15.621032  8.438902 22.803161
##                                             p adj
## Asian Internationals-European Americans 0.7496332
## Asian Americans-European Americans      0.0000335
## Asian Americans-Asian Internationals    0.0000020

Post-hoc tests: corrections.

Have you heard of post-hoc corrections? Why do they exist?

Read more here and use them sensibly.

Why Bonferroni should be abandoned (in medicine and ecology).

Make a report. Stargazer.

You can make a prettier layout. By changing the variable names, etc. You can also change the labels it prints.

Totally check out stargazer!

require(broom)
require(stargazer)
# Broom turns our result into a dataframe!
# We can then 'tidy' it and make a report!
tidy_resultTukey<-tidy(resultTukey)
stargazer(tidy_resultTukey, summary = F, type="html", out="Tukey.html")

apaTables

Alternative. This will also have Cohen's d effect size values.

require(apaTables)
apa.d.table(iv=GRP_fact, dv=O, data=data_no_miss, filename="Table_1_APA_tukey.doc", show.conf.interval=T, landscape=T, table.number = 1)
## 
## 
## Table 1 
## 
## Means, standard deviations, and d-values with confidence intervals
##  
## 
##   Variable                M      SD    1             2           
##   1. European Americans   113.40 17.12                           
##                                                                  
##   2. Asian Internationals 111.36 15.24 0.13                      
##                                        [-0.20, 0.45]             
##                                                                  
##   3. Asian Americans      126.98 19.11 0.75          0.92        
##                                        [0.40, 1.11]  [0.55, 1.28]
##                                                                  
## 
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval for each d-value. 
## The confidence interval is a plausible range of population d-values 
## that could have caused the sample d-value (Cumming, 2014). 
## d-values are estimates calculated using formulas 4.18 and 4.19 
## from Borenstein, Hedges, Higgins, & Rothstein (2009). 
## d-values not calculated if unequal variances prevented pooling.
## 

Try it yourself.

  • Run an ANOVA with 'E' extraversion as the dependent, and group as the independent.
  • Conduct the post-hoc tests and export those as either .html or .docx

What if we were unlucky?

How would you have proceeded, if the assumptions were violated?

Non-parametric tests.

They do not assume a normal distribution but can be less powerful.

Many options exist. You can read and find others (see references).

WRS2 package

The t1way function computes a one-way ANOVA for the trimmed means. Homoscedasticity assumption not required.

lincoln() for the posthoc tests.

#WRS.
require(WRS2)
t1way(O ~ GRP_fact, data=data_no_miss)
## Call:
## t1way(formula = O ~ GRP_fact, data = data_no_miss)
## 
## Test statistic: F = 11.067 
## Degrees of freedom 1: 2 
## Degrees of freedom 2: 74.67 
## p-value: 6e-05 
## 
## Explanatory measure of effect size: 0.44

t1waybt (WRS2)

The t1waybt function computes a one-way ANOVA for the trimmed means.

# Specify Between.
t1waybt(O ~ GRP_fact, data=data_no_miss, nboot=10000)
## Call:
## t1waybt(formula = O ~ GRP_fact, data = data_no_miss, nboot = 10000)
## 
## Effective number of bootstrap samples was 10000.
## 
## Test statistic: 11.067 
## p-value: 1e-04 
## Variance explained 0.192 
## Effect size 0.439

med1way (WRS2)

Computes a one-way ANOVA for the medians. Homoscedasticity assumption not required. Avoid too many ties.

# WRS, ANOVA for Medians. (note iter=)
med1way(O ~ GRP_fact, data=data_no_miss, iter=10000)
## Call:
## med1way(formula = O ~ GRP_fact, data = data_no_miss, iter = 10000)
## 
## Test statistic F: 6.6673 
## Critical value: 3.2621 
## p-value: 0.0036
# analysis of Medians leads to same conclusion.

Read through WRS2 manual or Wilcox (2012) to find out more.

Permutation tests.

Permutation tests use random shuffles of the data to get the correct distribution of a test statistic under a null hypothesis. (No power issue by the way)

Shuffles are not the same as bootstraps. Some assumptions do still apply (e.g., non-independence of observations).

Read more here and here

require(coin)
independence_test(O ~ GRP_fact, data=data_no_miss)
## 
##  Asymptotic General Independence Test
## 
## data:  O by
##   GRP_fact (European Americans, Asian Internationals, Asian Americans)
## maxT = 5.0961, p-value = 8.742e-07
## alternative hypothesis: two.sided

Sample write up.

A permutation test via the 'coin' package showed that there are significant differences between the three groups in Openness to Experience (maxT= 5.10, p< .0001).

(Note post-hoc tests are currently unavailable via 'coin' but you can get them via 'rcompanion', more here)

ANCOVA

The difference is the C. Covariate.

Often we want to control for a potential confound, so suppose that you are testing a new weight loss drug. You could analyse the participant's weight at the end of the trial while partialling out their start weight. This would be an ANCOVA scenario (but beware).

Beware of Lord's paradox. If one researcher calculates a change score and runs an independent samples t-test while the other runs an ANCOVA

Important: order effects.

The order in which you put in things matters!!

# Type I errors
lm_ancova<-lm(O~E+GRP_fact, data=data_no_miss)
anova(lm_ancova)
## Analysis of Variance Table
## 
## Response: O
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## E           1   8349  8348.9  34.339 1.894e-08 ***
## GRP_fact    2  10325  5162.7  21.234 4.379e-09 ***
## Residuals 199  48383   243.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compare to previous slide!

# note that the order matters for the F-tests.
lm_ancova2<-lm(O~GRP_fact+E, data=data_no_miss)
anova(lm_ancova2)
## Analysis of Variance Table
## 
## Response: O
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## GRP_fact    2   8774  4387.0  18.044 6.289e-08 ***
## E           1   9900  9900.3  40.720 1.209e-09 ***
## Residuals 199  48383   243.1                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

More about Types of errors later.

Type I,II,III errors.

For now, you should be aware that type of errors matters (I,II,III). read more here and here

SPSS uses type III. So let's aim to emulate (even though that might not always be optimal)

Order invariant: SPSS uses Type III.

# Type III errors
require(car)
require(compute.es)
Ancova<-lm(O~GRP_fact+E, contrasts=list(GRP_fact=contr.sum), data=data_no_miss)
Anova(Ancova, type="III")
## Anova Table (Type III tests)
## 
## Response: O
##             Sum Sq  Df F value    Pr(>F)    
## (Intercept)  25976   1 106.838 < 2.2e-16 ***
## GRP_fact     10325   2  21.234 4.379e-09 ***
## E             9900   1  40.720 1.209e-09 ***
## Residuals    48383 199                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

post-hoc tests.

require(multcomp)
posth=glht(Ancova, linfct=mcp(GRP_fact="Tukey"))  

summary(posth)

summary(posth) ##shows the output in a nice format
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = O ~ GRP_fact + E, data = data_no_miss, contrasts = list(GRP_fact = contr.sum))
## 
## Linear Hypotheses:
##                                                Estimate Std. Error t value
## Asian Internationals - European Americans == 0    4.837      2.789   1.734
## Asian Americans - European Americans == 0        17.870      2.835   6.304
## Asian Americans - Asian Internationals == 0      13.033      2.808   4.642
##                                                Pr(>|t|)    
## Asian Internationals - European Americans == 0    0.195    
## Asian Americans - European Americans == 0        <1e-04 ***
## Asian Americans - Asian Internationals == 0      <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Sample write up.

A one-way ANCOVA was conducted to compare Openness across groups whilst controlling for Extraversion. There was a significant effect of cultural group, F(2,199)=21.23, p<.0001. In addition, the effect of the covariate, Extraversion, was significant, F(1,199)=40.72, p<.0001.

You would then describe the post-hoc differences and also explore how extraversion is associated with (e.g., correlation, plot.)

Non-parametric alternative.

Look at the 'lmPerm' package and this.

MANOVA

All the previous assumptions + (+ multicollinearity (not an assumption but a problem)

Multivariate normality

Homogeneity of co-variances.

Multivariate Normality

Multivariate Shapiro test. (The Univariate's test bigger brother…).

Sensitive to sample size, if large small deviations will lead to significance.

require(RVAideMemoire)
multivariatenorm<-dplyr::select(data_no_miss, -GRP_fact, -GRP,-ID)
mshapiro.test(multivariatenorm)
## 
##  Multivariate Shapiro-Wilk normality test
## 
## data:  (N,E,O,A,C)
## W = 0.98577, p-value = 0.03887

Alternative methods.

A whole host of alternatives. More here.

require(MVN) # Alternative method, one example
mvn(multivariatenorm)
## $multivariateNormality
##              Test        Statistic            p value Result
## 1 Mardia Skewness 57.0115702615851 0.0107887129793524     NO
## 2 Mardia Kurtosis  1.9394290567207 0.0524491152485054    YES
## 3             MVN             <NA>               <NA>     NO
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk     N        0.9942    0.6210    YES   
## 2 Shapiro-Wilk     E        0.9897    0.1552    YES   
## 3 Shapiro-Wilk     O        0.9893    0.1360    YES   
## 4 Shapiro-Wilk     A        0.9915    0.2815    YES   
## 5 Shapiro-Wilk     C        0.9889    0.1158    YES   
## 
## $Descriptives
##     n     Mean  Std.Dev Median Min Max  25th  75th       Skew   Kurtosis
## N 203  95.9064 21.23737     97  36 160  81.0 109.5 -0.1442702 -0.1102764
## E 203 116.7291 19.93406    117  50 175 103.0 129.5 -0.1039661  0.6504353
## O 203 116.4236 18.21999    115  72 166 104.0 127.0  0.3079877 -0.1699643
## A 203 113.9261 18.29136    115  58 161 103.0 127.0 -0.2962256  0.1136037
## C 203 111.7734 18.53971    112  57 171  98.5 124.0  0.1476949  0.4428221

Conclusion: Multivariate normality

Violated (but could be way worse, trust me … )

mvn(multivariatenorm, multivariatePlot = "qq")

## $multivariateNormality
##              Test        Statistic            p value Result
## 1 Mardia Skewness 57.0115702615851 0.0107887129793524     NO
## 2 Mardia Kurtosis  1.9394290567207 0.0524491152485054    YES
## 3             MVN             <NA>               <NA>     NO
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk     N        0.9942    0.6210    YES   
## 2 Shapiro-Wilk     E        0.9897    0.1552    YES   
## 3 Shapiro-Wilk     O        0.9893    0.1360    YES   
## 4 Shapiro-Wilk     A        0.9915    0.2815    YES   
## 5 Shapiro-Wilk     C        0.9889    0.1158    YES   
## 
## $Descriptives
##     n     Mean  Std.Dev Median Min Max  25th  75th       Skew   Kurtosis
## N 203  95.9064 21.23737     97  36 160  81.0 109.5 -0.1442702 -0.1102764
## E 203 116.7291 19.93406    117  50 175 103.0 129.5 -0.1039661  0.6504353
## O 203 116.4236 18.21999    115  72 166 104.0 127.0  0.3079877 -0.1699643
## A 203 113.9261 18.29136    115  58 161 103.0 127.0 -0.2962256  0.1136037
## C 203 111.7734 18.53971    112  57 171  98.5 124.0  0.1476949  0.4428221

Solutions.

Solution: Check for outliers, run analyses again. Report both!

Transformations. Box-Cox transform.

Central limit theorem to the rescue again! (also note Hotelling's T^2 robust)

Homogeneity of variance.

As with the univariate case. So, you would run a series of tests such as Levene's Test and examine those.

Not done here but you can run these on your own.

Homogeneity of covariances: Box's M test.

Approach with care.

Very easily significant. Therefore, usually p=.001 as threshold. More here.

require(heplots) #alternative is 'biotools' package.
boxM(multivariatenorm, data_no_miss$GRP)
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  multivariatenorm
## Chi-Sq (approx.) = 40.668, df = 30, p-value = 0.09256

MANOVA

Manovamodel <- manova(cbind(N,E,A,O,C) ~ GRP_fact, data = data_no_miss)
summary(Manovamodel)
##            Df  Pillai approx F num Df den Df    Pr(>F)    
## GRP_fact    2 0.41862    10.43     10    394 1.106e-15 ***
## Residuals 200                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

MANOVA

Other measures exist but Pillai's Trace is typically most common. You can find out more about them here. Also about the math (also see references).

Report.

Pillai's Trace most commonly reported.

Sample report: A MANOVA was conducted with five dependent variables (Neuroticism, Extraversion, Conscientousness, Agreeableness, and Openness to Experience) and cultural group as the between-subject factor. A statistically significant effect was found (Pillai's Trace= .42, F(10,394)=10.43, p<.0001).

You would then report follow-up tests! (e.g., Univariate ANOVA's)

In conclusion the personality profiles differ between these 3 groups.

Follow up tests

Exercise.

Using last week's Salaries dataset, test the assumptions for an ANOVA with monthly salary as dependent variable and academic rank as independent variable.

Conduct an ANOVA with the appropriate post-hoc tests. What do you conclude?

Conduct an alternative non-parametric test to the ANOVA. What do you conclude.

Conduct an ANCOVA with gender as factor, and years since Ph.D. as covariate. What do you conclude?

Exercise continued.

Load the iris dataset, it is part of the datasets package.

It is a famous dataset with measurements of 3 species of iris flowers.

Test the assumptions for a MANOVA with Species as the between-subject factor and petal length and sepal length as dependent variables.

Run the MANOVA. What do you conclude? Write up the result as you would do for a paper?

BONUS: Plot one of the results from your analyses on the salaries database.

BONUS: Conduct a follow up analysis or plot for the MANOVA.

References (and further reading.)

  • Crawley, M. J. (2013). The R book: Second edition. New York, NY: John Wiley & Sons.
  • Kassambara, A. (2017) STHDA: One way ANOVA in R
  • Kassambara, A. (2017) STHDA: Statistical tests and assumptions
  • Mangiafico, S. (2017) rcompanion
  • Siegel, S. & Castellan, N.J. Jr. (1988). Nonparametric statistics for the behavioral sciences. 2nd Edn. McGraw-Hill, New York.
  • Wickham, H., & Grolemund, G. (2017). R for data science. Sebastopol, CA: O’Reilly.
  • Wilcox, R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). New York, NY: Elsevier.