2018-11-20 | disclaimer

PY0782: Advanced Quantitative research methods.

  • Last lecture: Mediation
  • Today: Exploratory Factor analysis

Goals (today)

Factor analysis

Some ways of visualising factor analysis.

Assignment

After today you should be able to complete the following sections for Assignment II:

Exploratory Factor Analysis and its assumptions.

Factor analysis

Who has run an exploratory factor analysis?

What was the purpose?

Purpose.

We want to study the covariation between a large number of observed variables.

  1. How many latent factors would account for most of the variation among the observed variables?

  2. Which variables appear to define each factor. What labels could we give to these factors? If the observed covariation can be explained by a small number of factors (e.g., 2-5), this would increase our understanding of the relationships among the variables!

–> Reduce complexity and increase understanding.

–> validate scale (–> ultimately confirmatory factor analysis).

Terminology.

Exploratory vs. confirmatory. Occasionally you will have a very clear idea has to how many factors there should be. In such a case one would usually do confirmatory analysis.

Terminology.

Principal components vs. Factor analysis.

"The idea of principal components analysis (PCA) is to find a small number of linear combinations of the variables so as to capture most of the variation in the dataframe as a whole. … Principal components analysis finds a set of orthogonal standardized linear combinations which together explain all of the variation in the original data. There are as many principal components as there are variables, but typically it is only the first few of them that explain important amounts of the total variation." Crawley (2013:809-810)

Terminology: Factor analysis

"With principal components analysis we were fundamentally interested in the variables and their contributions. Factor analysis aims to provide usable numerical values for quantities such as intelligence or social status that are not directly measurable. The idea is to use correlations between observable variables in terms of underlying ‘factors’." Crawley (2013:813)

Note that factors here means something fundamentally different than factors when we were describing a single variable.

Also some researchers will use the terms interchangeably (even though they are separate techniques).

Principal component analysis

Today we will mostly deal with factor analysis (should you require principal component analysis, have a look here and here)

The mathematics are also described in those sources.

Factor analysis assumptions.

In essence you can think of factor analysis as OLS regression, which means similar assumptions apply.

Measurement: All variables should be interval. No dummy variables. No outliers.

Sample size: >200 , although some advocate 5-10 per variable but see this on rules of thumb.

Multivariate normality: Though not necessarily required for exploratory factor analysis, useful to check.

Linear: the proposed relationships are linear.

Factorability: There should be some correlations which can be meaningfully grouped together.

More on assumptions here

Data

Data are from here. 240 participants providing self-ratings (1-9) on 32 variables.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture7")
f_data <- read.table("personality0.txt")
require(stargazer)
stargazer(f_data, type = "html", out= "factor_data.html")

Measurement and sample size

Measurement and sample sizes OK. (Though 1 to 9, one can always question how 'interval' that really is. 1 to 7 would be worse and that is commonly used)

Multivariate normality.

As an aside. This is not looking great… . We will ignore it for now, given that we are conducting exploratory factor analysis.

require(MVN)
mvn(f_data)
## $multivariateNormality
##              Test        Statistic              p value Result
## 1 Mardia Skewness 8120.19888493781 1.26964854513512e-69     NO
## 2 Mardia Kurtosis 15.3220135668768                    0     NO
## 3             MVN             <NA>                 <NA>     NO
## 
## $univariateNormality
##            Test  Variable Statistic   p value Normality
## 1  Shapiro-Wilk  distant     0.9259  <0.001      NO    
## 2  Shapiro-Wilk  talkatv     0.9530  <0.001      NO    
## 3  Shapiro-Wilk  carelss     0.9193  <0.001      NO    
## 4  Shapiro-Wilk  hardwrk     0.9102  <0.001      NO    
## 5  Shapiro-Wilk  anxious     0.9571  <0.001      NO    
## 6  Shapiro-Wilk  agreebl     0.9200  <0.001      NO    
## 7  Shapiro-Wilk   tense      0.9521  <0.001      NO    
## 8  Shapiro-Wilk   kind       0.9157  <0.001      NO    
## 9  Shapiro-Wilk  opposng     0.9435  <0.001      NO    
## 10 Shapiro-Wilk  relaxed     0.9644  <0.001      NO    
## 11 Shapiro-Wilk  disorgn     0.9360  <0.001      NO    
## 12 Shapiro-Wilk  outgoin     0.9441  <0.001      NO    
## 13 Shapiro-Wilk  approvn     0.9481  <0.001      NO    
## 14 Shapiro-Wilk    shy       0.9566  <0.001      NO    
## 15 Shapiro-Wilk  discipl     0.9411  <0.001      NO    
## 16 Shapiro-Wilk   harsh      0.9276  <0.001      NO    
## 17 Shapiro-Wilk  persevr     0.9255  <0.001      NO    
## 18 Shapiro-Wilk  friendl     0.9039  <0.001      NO    
## 19 Shapiro-Wilk  worryin     0.9406  <0.001      NO    
## 20 Shapiro-Wilk  respnsi     0.8741  <0.001      NO    
## 21 Shapiro-Wilk  contrar     0.9523  <0.001      NO    
## 22 Shapiro-Wilk  sociabl     0.9300  <0.001      NO    
## 23 Shapiro-Wilk   lazy       0.9582  <0.001      NO    
## 24 Shapiro-Wilk  coopera     0.9277  <0.001      NO    
## 25 Shapiro-Wilk   quiet      0.9602  <0.001      NO    
## 26 Shapiro-Wilk  organiz     0.9457  <0.001      NO    
## 27 Shapiro-Wilk  criticl     0.9571  <0.001      NO    
## 28 Shapiro-Wilk    lax       0.9498  <0.001      NO    
## 29 Shapiro-Wilk  laidbck     0.9640  <0.001      NO    
## 30 Shapiro-Wilk  withdrw     0.9266  <0.001      NO    
## 31 Shapiro-Wilk  givinup     0.8737  <0.001      NO    
## 32 Shapiro-Wilk  easygon     0.9473  <0.001      NO    
## 
## $Descriptives
##           n     Mean  Std.Dev Median Min Max 25th 75th        Skew
## distant 240 3.866667 1.794615      3   1   8 2.00    5  0.21660329
## talkatv 240 5.883333 1.677732      6   2   9 5.00    7 -0.18189899
## carelss 240 3.412500 1.811357      3   1   9 2.00    5  0.66867120
## hardwrk 240 6.925000 1.370108      7   2   9 6.00    8 -0.80831996
## anxious 240 5.129167 1.880305      5   1   9 4.00    7 -0.09210485
## agreebl 240 6.629167 1.372162      7   1   9 6.00    8 -0.78725929
## tense   240 4.616667 1.904337      5   1   9 3.00    6 -0.02518179
## kind    240 6.970833 1.262255      7   2   9 6.00    8 -0.70373663
## opposng 240 3.858333 1.599141      4   1   8 3.00    5  0.46913404
## relaxed 240 5.475000 1.694009      5   1   9 4.00    7 -0.08833270
## disorgn 240 4.083333 2.126082      4   1   9 2.00    6  0.16985907
## outgoin 240 6.020833 1.809894      6   2   9 5.00    7 -0.34291775
## approvn 240 5.858333 1.367867      6   2   9 5.00    7 -0.13555584
## shy     240 4.558333 1.969626      5   1   9 3.00    6  0.06063502
## discipl 240 6.308333 1.725011      7   1   9 5.00    7 -0.56730380
## harsh   240 3.600000 1.683789      3   1   8 2.00    5  0.45079493
## persevr 240 6.804167 1.405006      7   2   9 6.00    8 -0.62497986
## friendl 240 7.250000 1.155304      7   2   9 7.00    8 -0.59175740
## worryin 240 5.212500 2.108126      6   1   9 3.00    7 -0.07134918
## respnsi 240 7.291667 1.395725      8   1   9 7.00    8 -1.21576315
## contrar 240 3.770833 1.500900      4   1   8 3.00    5  0.22186635
## sociabl 240 6.445833 1.567579      7   2   9 5.00    8 -0.64011468
## lazy    240 4.179167 1.893941      4   1   9 3.00    5  0.20658153
## coopera 240 6.695833 1.197619      7   3   9 6.00    7 -0.37705887
## quiet   240 4.604167 1.880750      5   1   9 3.00    6  0.15018658
## organiz 240 6.154167 1.963363      6   1   9 5.00    8 -0.45913660
## criticl 240 5.170833 1.745282      5   1   9 4.00    6 -0.21890441
## lax     240 4.083333 1.664713      4   1   9 3.00    5  0.41571022
## laidbck 240 5.245833 1.790837      5   1   9 4.00    7 -0.13048078
## withdrw 240 3.754167 1.769684      3   1   7 2.00    5  0.17034350
## givinup 240 2.675000 1.553307      2   1   8 1.75    4  1.00112342
## easygon 240 6.066667 1.601429      6   2   9 5.00    7 -0.41865452
##             Kurtosis
## distant -1.115187889
## talkatv -0.748747155
## carelss -0.309281507
## hardwrk  0.614831474
## anxious -0.771926581
## agreebl  1.042140998
## tense   -0.925252081
## kind     0.767591124
## opposng -0.293884606
## relaxed -0.507015321
## disorgn -1.065496081
## outgoin -0.761161418
## approvn -0.154427814
## shy     -0.919119555
## discipl  0.145322024
## harsh   -0.736948487
## persevr  0.642059811
## friendl  1.210961241
## worryin -1.139934280
## respnsi  2.252882721
## contrar -0.472970716
## sociabl  0.061296675
## lazy    -0.597716553
## coopera  0.004767416
## quiet   -0.728162291
## organiz -0.438841538
## criticl -0.556012334
## lax     -0.088209525
## laidbck -0.522778720
## withdrw -1.142037389
## givinup  0.564770534
## easygon -0.324220228

Plot

## $multivariateNormality
##              Test        Statistic              p value Result
## 1 Mardia Skewness 8120.19888493781 1.26964854513512e-69     NO
## 2 Mardia Kurtosis 15.3220135668768                    0     NO
## 3             MVN             <NA>                 <NA>     NO
## 
## $univariateNormality
##            Test  Variable Statistic   p value Normality
## 1  Shapiro-Wilk  distant     0.9259  <0.001      NO    
## 2  Shapiro-Wilk  talkatv     0.9530  <0.001      NO    
## 3  Shapiro-Wilk  carelss     0.9193  <0.001      NO    
## 4  Shapiro-Wilk  hardwrk     0.9102  <0.001      NO    
## 5  Shapiro-Wilk  anxious     0.9571  <0.001      NO    
## 6  Shapiro-Wilk  agreebl     0.9200  <0.001      NO    
## 7  Shapiro-Wilk   tense      0.9521  <0.001      NO    
## 8  Shapiro-Wilk   kind       0.9157  <0.001      NO    
## 9  Shapiro-Wilk  opposng     0.9435  <0.001      NO    
## 10 Shapiro-Wilk  relaxed     0.9644  <0.001      NO    
## 11 Shapiro-Wilk  disorgn     0.9360  <0.001      NO    
## 12 Shapiro-Wilk  outgoin     0.9441  <0.001      NO    
## 13 Shapiro-Wilk  approvn     0.9481  <0.001      NO    
## 14 Shapiro-Wilk    shy       0.9566  <0.001      NO    
## 15 Shapiro-Wilk  discipl     0.9411  <0.001      NO    
## 16 Shapiro-Wilk   harsh      0.9276  <0.001      NO    
## 17 Shapiro-Wilk  persevr     0.9255  <0.001      NO    
## 18 Shapiro-Wilk  friendl     0.9039  <0.001      NO    
## 19 Shapiro-Wilk  worryin     0.9406  <0.001      NO    
## 20 Shapiro-Wilk  respnsi     0.8741  <0.001      NO    
## 21 Shapiro-Wilk  contrar     0.9523  <0.001      NO    
## 22 Shapiro-Wilk  sociabl     0.9300  <0.001      NO    
## 23 Shapiro-Wilk   lazy       0.9582  <0.001      NO    
## 24 Shapiro-Wilk  coopera     0.9277  <0.001      NO    
## 25 Shapiro-Wilk   quiet      0.9602  <0.001      NO    
## 26 Shapiro-Wilk  organiz     0.9457  <0.001      NO    
## 27 Shapiro-Wilk  criticl     0.9571  <0.001      NO    
## 28 Shapiro-Wilk    lax       0.9498  <0.001      NO    
## 29 Shapiro-Wilk  laidbck     0.9640  <0.001      NO    
## 30 Shapiro-Wilk  withdrw     0.9266  <0.001      NO    
## 31 Shapiro-Wilk  givinup     0.8737  <0.001      NO    
## 32 Shapiro-Wilk  easygon     0.9473  <0.001      NO    
## 
## $Descriptives
##           n     Mean  Std.Dev Median Min Max 25th 75th        Skew
## distant 240 3.866667 1.794615      3   1   8 2.00    5  0.21660329
## talkatv 240 5.883333 1.677732      6   2   9 5.00    7 -0.18189899
## carelss 240 3.412500 1.811357      3   1   9 2.00    5  0.66867120
## hardwrk 240 6.925000 1.370108      7   2   9 6.00    8 -0.80831996
## anxious 240 5.129167 1.880305      5   1   9 4.00    7 -0.09210485
## agreebl 240 6.629167 1.372162      7   1   9 6.00    8 -0.78725929
## tense   240 4.616667 1.904337      5   1   9 3.00    6 -0.02518179
## kind    240 6.970833 1.262255      7   2   9 6.00    8 -0.70373663
## opposng 240 3.858333 1.599141      4   1   8 3.00    5  0.46913404
## relaxed 240 5.475000 1.694009      5   1   9 4.00    7 -0.08833270
## disorgn 240 4.083333 2.126082      4   1   9 2.00    6  0.16985907
## outgoin 240 6.020833 1.809894      6   2   9 5.00    7 -0.34291775
## approvn 240 5.858333 1.367867      6   2   9 5.00    7 -0.13555584
## shy     240 4.558333 1.969626      5   1   9 3.00    6  0.06063502
## discipl 240 6.308333 1.725011      7   1   9 5.00    7 -0.56730380
## harsh   240 3.600000 1.683789      3   1   8 2.00    5  0.45079493
## persevr 240 6.804167 1.405006      7   2   9 6.00    8 -0.62497986
## friendl 240 7.250000 1.155304      7   2   9 7.00    8 -0.59175740
## worryin 240 5.212500 2.108126      6   1   9 3.00    7 -0.07134918
## respnsi 240 7.291667 1.395725      8   1   9 7.00    8 -1.21576315
## contrar 240 3.770833 1.500900      4   1   8 3.00    5  0.22186635
## sociabl 240 6.445833 1.567579      7   2   9 5.00    8 -0.64011468
## lazy    240 4.179167 1.893941      4   1   9 3.00    5  0.20658153
## coopera 240 6.695833 1.197619      7   3   9 6.00    7 -0.37705887
## quiet   240 4.604167 1.880750      5   1   9 3.00    6  0.15018658
## organiz 240 6.154167 1.963363      6   1   9 5.00    8 -0.45913660
## criticl 240 5.170833 1.745282      5   1   9 4.00    6 -0.21890441
## lax     240 4.083333 1.664713      4   1   9 3.00    5  0.41571022
## laidbck 240 5.245833 1.790837      5   1   9 4.00    7 -0.13048078
## withdrw 240 3.754167 1.769684      3   1   7 2.00    5  0.17034350
## givinup 240 2.675000 1.553307      2   1   8 1.75    4  1.00112342
## easygon 240 6.066667 1.601429      6   2   9 5.00    7 -0.41865452
##             Kurtosis
## distant -1.115187889
## talkatv -0.748747155
## carelss -0.309281507
## hardwrk  0.614831474
## anxious -0.771926581
## agreebl  1.042140998
## tense   -0.925252081
## kind     0.767591124
## opposng -0.293884606
## relaxed -0.507015321
## disorgn -1.065496081
## outgoin -0.761161418
## approvn -0.154427814
## shy     -0.919119555
## discipl  0.145322024
## harsh   -0.736948487
## persevr  0.642059811
## friendl  1.210961241
## worryin -1.139934280
## respnsi  2.252882721
## contrar -0.472970716
## sociabl  0.061296675
## lazy    -0.597716553
## coopera  0.004767416
## quiet   -0.728162291
## organiz -0.438841538
## criticl -0.556012334
## lax     -0.088209525
## laidbck -0.522778720
## withdrw -1.142037389
## givinup  0.564770534
## easygon -0.324220228

Linearity

You can do pairwise scatterplots but with range 1-9 this is not wholly useful. We will just assume linearity will do.

require(ggplot2)
require(GGally)
ggpairs(f_data[,1:4]) # example

Factorability.

Here we want Bartlett's test to be significant! Why?

bartlett.test(f_data)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  f_data
## Bartlett's K-squared = 350.08, df = 31, p-value < 2.2e-16

Sample write up

Bartlett's test for sphericity was significant suggesting that factor analysis is appropriate (\(\chi^2\)(31) = 350.1, p < .0001).

KMO-test

Kaiser-Meyer-Olkin factor adequacy ranges from 0 to 1. All should be >.5 (Kaiser, 1977)

require(psych)
KMO(f_data)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = f_data)
## Overall MSA =  0.84
## MSA for each item = 
## distant talkatv carelss hardwrk anxious agreebl   tense    kind opposng 
##    0.88    0.86    0.82    0.87    0.82    0.73    0.84    0.81    0.79 
## relaxed disorgn outgoin approvn     shy discipl   harsh persevr friendl 
##    0.86    0.75    0.87    0.89    0.87    0.84    0.85    0.86    0.87 
## worryin respnsi contrar sociabl    lazy coopera   quiet organiz criticl 
##    0.81    0.86    0.83    0.90    0.89    0.83    0.87    0.78    0.87 
##     lax laidbck withdrw givinup easygon 
##    0.81    0.73    0.90    0.89    0.78

Interpretations

0.00 to 0.49 unacceptable.

0.50 to 0.59 miserable.

0.60 to 0.69 mediocre.

0.70 to 0.79 middling.

0.80 to 0.89 meritorious.

0.90 to 1.00 marvelous.

Sample write-up

All 32 items showed middling to meritorious adequacy for factor analysis (all MSA\(\geq\).72).

Try it yourself.

Download the data from here. Data are ratings of instructors from a study by Sidanius.

Read in the data, make a subset with items 13:24 (Hint: use the select function from dplyr and num_range) and conduct a KMO test.

ggcorrplot

# ggcorrplot, you can then further tweak this, as it is a ggplot.
require(ggcorrplot)
require(dplyr)
# take the absolute, interested in strength.
cormatrix<-abs(cor(f_data[,1:20])) 
corplot<-ggcorrplot(cormatrix, hc.order = TRUE, type = "lower", method = "circle") 

Plot

Some further terms.

See here

Factor: An underlying or latent construct causing the observed variables, to a greater or lesser extent. A factor is estimated by a linear combination of out observed variables. When the ‘best fitting’ factors are found, it should be remembered that these factors are not unique! It can be shown that any rotation of the best-fitting factors is also best-fitting. ‘Interpretability’ is used to select the ‘best’ rotation among the equally ‘good’ rotations: To be useful, factors should be interpretable. Rotation of factors is used to improve the interpretability of factors. So once we have extracted the factors, we will rotate them.

Factor loadings: The degree to which the variable is driven or ‘caused’ by the factor.

More terms.

Factor score/weights: These can be estimated for each factor. This can then be added to your dataframe. Basically a score for your participant on each factor. Occasionally, one would then use those scores in further analyses.

Communality of a variable: The extent to which the variability across participants in a variable is ‘explained’ by the set of factors extracted in the factor analysis. Uniqueness = 1-Communality.

Basic factor analysis

Varimax rotation. Let's start by getting 8 factors. (Request large number and then trim down!)

require(psych)
fa <- fa(f_data,8, fm = 'minres', rotate='varimax', fa = 'fa')

Output

sink('fa_output.text')
fa
## Factor Analysis using method =  minres
## Call: fa(r = f_data, nfactors = 8, rotate = "varimax", fm = "minres", 
##     fa = "fa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           MR1   MR2   MR4   MR8   MR6   MR5   MR7   MR3   h2   u2 com
## distant  0.60  0.02  0.11 -0.13  0.13  0.24  0.04  0.10 0.48 0.52 1.7
## talkatv -0.76  0.06 -0.01  0.09  0.11  0.14  0.04  0.07 0.63 0.37 1.2
## carelss  0.05 -0.25  0.12 -0.08  0.63  0.19  0.08  0.14 0.54 0.46 1.9
## hardwrk -0.18  0.70  0.14  0.06 -0.21  0.01 -0.04  0.07 0.59 0.41 1.5
## anxious  0.16  0.02  0.75  0.07  0.09  0.15 -0.09  0.06 0.63 0.37 1.3
## agreebl -0.03  0.02 -0.01  0.71  0.08 -0.17  0.13  0.15 0.58 0.42 1.3
## tense    0.15  0.06  0.77 -0.04  0.04  0.21 -0.21  0.09 0.72 0.28 1.5
## kind    -0.12  0.19  0.04  0.64 -0.22 -0.12  0.04 -0.21 0.56 0.44 1.9
## opposng -0.01 -0.07  0.09 -0.14  0.09  0.68  0.00 -0.10 0.51 0.49 1.2
## relaxed -0.03 -0.09 -0.52  0.25  0.05 -0.09  0.49 -0.10 0.61 0.39 2.7
## disorgn  0.02 -0.31  0.02 -0.01  0.82  0.08  0.10 -0.05 0.79 0.21 1.3
## outgoin -0.84  0.12  0.00  0.15 -0.02 -0.03  0.12  0.01 0.75 0.25 1.1
## approvn -0.29  0.13 -0.04  0.48 -0.07 -0.14  0.19  0.11 0.40 0.60 2.6
## shy      0.71 -0.22  0.17  0.00  0.03 -0.07 -0.01 -0.07 0.59 0.41 1.4
## discipl  0.05  0.69  0.04  0.08 -0.19  0.00 -0.03  0.16 0.55 0.45 1.3
## harsh    0.08 -0.04  0.05 -0.18  0.15  0.64 -0.05  0.26 0.55 0.45 1.7
## persevr -0.14  0.64  0.09  0.14 -0.08  0.04 -0.04 -0.18 0.50 0.50 1.5
## friendl -0.52  0.16  0.10  0.48 -0.06 -0.15  0.11 -0.27 0.64 0.36 3.2
## worryin  0.16 -0.03  0.77 -0.04  0.00  0.10 -0.15 -0.20 0.69 0.31 1.4
## respnsi -0.01  0.56 -0.01  0.29 -0.40  0.06 -0.13 -0.16 0.60 0.40 2.8
## contrar  0.06 -0.08  0.17 -0.16  0.12  0.72  0.00  0.06 0.59 0.41 1.3
## sociabl -0.74 -0.06 -0.04  0.22 -0.06 -0.07  0.11 -0.06 0.63 0.37 1.3
## lazy     0.16 -0.63  0.16 -0.02  0.26  0.15  0.16 -0.03 0.57 0.43 1.9
## coopera -0.11  0.13 -0.12  0.63 -0.06 -0.24  0.04 -0.07 0.51 0.49 1.6
## quiet    0.79 -0.14  0.21  0.16 -0.03  0.02  0.03 -0.08 0.71 0.29 1.3
## organiz -0.10  0.40  0.00  0.09 -0.76  0.00 -0.03  0.04 0.76 0.24 1.6
## criticl  0.09  0.12  0.13 -0.11 -0.12  0.62 -0.05 -0.10 0.46 0.54 1.4
## lax      0.02 -0.33 -0.04  0.10  0.25  0.04  0.39  0.01 0.34 0.66 2.9
## laidbck -0.05 -0.07 -0.28  0.03  0.11 -0.04  0.84  0.05 0.80 0.20 1.3
## withdrw  0.74 -0.06  0.17 -0.11  0.12  0.22  0.04  0.13 0.67 0.33 1.5
## givinup  0.34 -0.46  0.28 -0.10  0.12  0.13  0.02  0.22 0.50 0.50 3.6
## easygon -0.16 -0.11 -0.23  0.30 -0.01 -0.06  0.51 -0.03 0.45 0.55 2.5
## 
##                        MR1  MR2  MR4  MR8  MR6  MR5  MR7  MR3
## SS loadings           4.55 2.96 2.50 2.29 2.24 2.23 1.59 0.52
## Proportion Var        0.14 0.09 0.08 0.07 0.07 0.07 0.05 0.02
## Cumulative Var        0.14 0.23 0.31 0.38 0.45 0.52 0.57 0.59
## Proportion Explained  0.24 0.16 0.13 0.12 0.12 0.12 0.08 0.03
## Cumulative Proportion 0.24 0.40 0.53 0.65 0.77 0.89 0.97 1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 8 factors are sufficient.
## 
## The degrees of freedom for the null model are  496  and the objective function was  17.62 with Chi Square of  4009.54
## The degrees of freedom for the model are 268  and the objective function was  1.84 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  240 with the empirical chi square  140.27  with prob <  1 
## The total number of observations was  240  with Likelihood Chi Square =  409.11  with prob <  5.9e-08 
## 
## Tucker Lewis Index of factoring reliability =  0.924
## RMSEA index =  0.052  and the 90 % confidence intervals are  0.038 0.056
## BIC =  -1059.7
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR4  MR8  MR6
## Correlation of (regression) scores with factors   0.96 0.89 0.91 0.88 0.90
## Multiple R square of scores with factors          0.92 0.79 0.83 0.77 0.82
## Minimum correlation of possible factor scores     0.83 0.59 0.66 0.55 0.64
##                                                    MR5  MR7  MR3
## Correlation of (regression) scores with factors   0.88 0.89 0.73
## Multiple R square of scores with factors          0.78 0.79 0.53
## Minimum correlation of possible factor scores     0.56 0.58 0.06
sink()

Number of factors.

We need to determine the number of factors. Many options! From Revelle (2017:37):

  1. Extracting factors until the chi square of the residual matrix is not significant.

  2. Extracting factors until the change in chi square from factor n to factor n+1 is not significant.

  3. Extracting factors until the eigen values of the real data are less than the corresponding eigen values of a random data set of the same size (parallel analysis) fa.parallel (Horn, 1965).

  4. Plotting the magnitude of the successive eigen values and applying the scree test (a sudden drop in eigen values analogous to the change in slope seen when scrambling up the talus slope of a mountain and approaching the rock face (Cattell, 1966).

Continued… .

 5) Extracting factors as long as they are interpretable.

 6) Using the Very Simple Structure Criterion (vss) (Revelle and Rocklin, 1979).

 7) Using Wayne Velicer’s Minimum Average Partial (MAP) criterion (Velicer, 1976).

 8) Extracting principal components until the eigen value < 1 (Kaiser criterion).

Which one?

Each has advantages and disadvantages. 8) although common is probably the worst.

Read more here

Parallel analysis.

Also part of the output already. Here I used 'minres' as extraction method. Parallel analysis suggests 5 factors (compare red line to blue triangle)

require(psych)
parallel <- fa.parallel(f_data, fm = 'minres', fa = 'fa')

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA
parallel
## Call: fa.parallel(x = f_data, fm = "minres", fa = "fa")
## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA 
## 
##  Eigen Values of 
## 
##  eigen values of factors
##  [1]  6.52  3.66  2.35  1.53  1.02  0.38  0.14  0.02  0.01 -0.02 -0.09
## [12] -0.16 -0.18 -0.21 -0.25 -0.28 -0.33 -0.34 -0.37 -0.39 -0.40 -0.43
## [23] -0.45 -0.48 -0.49 -0.52 -0.55 -0.58 -0.60 -0.62 -0.67 -0.70
## 
##  eigen values of simulated factors
##  [1]  0.83  0.67  0.59  0.53  0.46  0.41  0.36  0.32  0.27  0.23  0.19
## [12]  0.15  0.11  0.08  0.04  0.00 -0.03 -0.06 -0.10 -0.13 -0.16 -0.20
## [23] -0.23 -0.26 -0.29 -0.32 -0.35 -0.38 -0.42 -0.46 -0.50 -0.54
## 
##  eigen values of components 
##  [1] 7.24 4.53 3.12 2.33 1.88 1.19 0.93 0.86 0.80 0.71 0.69 0.64 0.63 0.54
## [15] 0.51 0.47 0.46 0.45 0.44 0.41 0.39 0.37 0.33 0.30 0.29 0.28 0.25 0.23
## [29] 0.22 0.20 0.17 0.13
## 
##  eigen values of simulated components
## [1] NA

Kaiser criterion.

Kaiser criterion is the number of eigenvalues >1.

This can be seen on the graph. That would suggest 5 factors.

parallel$fa.values
##  [1]  6.518632220  3.663879765  2.352072662  1.529192200  1.019134280
##  [6]  0.375145536  0.143123374  0.017903052  0.008116657 -0.018428192
## [11] -0.092557391 -0.157077647 -0.179716109 -0.214661107 -0.246564997
## [16] -0.279092818 -0.331472781 -0.341440377 -0.366177997 -0.388743904
## [21] -0.397539896 -0.425290978 -0.449044173 -0.483146864 -0.488668863
## [26] -0.516074439 -0.548458000 -0.584031532 -0.602210864 -0.619056368
## [31] -0.674287746 -0.704824040

Scree plot

Depending on the elbow of the graph (Scree Criterion) you would extract 4 or 5.

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

VSS / Map test

The VSS plot suggests 3 factors. Very little improvement with 4. MAP suggests 6 factors. Output printed to console, check here

require(psych)
VSS(f_data, rotate= "varimax", n.obs= 240)# shows plot

## 
## Very Simple Structure
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
##     n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
## Although the VSS complexity 1 shows  5  factors, it is probably more reasonable to think about  3  factors
## VSS complexity 2 achieves a maximimum of 0.81  with  5  factors
## 
## The Velicer MAP achieves a minimum of 0.02  with  6  factors 
## BIC achieves a minimum of  -1194.35  with  6  factors
## Sample Size adjusted BIC achieves a minimum of  -210.21  with  8  factors
## 
## Statistics by number of factors 
##   vss1 vss2   map dof chisq     prob sqresid  fit RMSEA   BIC SABIC
## 1 0.52 0.00 0.045 464  2764 1.2e-322    47.7 0.52 0.149   221  1692
## 2 0.60 0.72 0.033 433  1992 5.7e-198    27.5 0.72 0.127  -381   992
## 3 0.61 0.78 0.024 403  1395 2.3e-109    17.9 0.82 0.106  -814   463
## 4 0.61 0.80 0.018 374   967  3.9e-54    12.5 0.87 0.086 -1082   103
## 5 0.62 0.81 0.015 346   711  2.2e-27     9.1 0.91 0.071 -1186   -89
## 6 0.56 0.79 0.015 319   554  6.8e-15     7.7 0.92 0.060 -1194  -183
## 7 0.54 0.77 0.016 293   467  3.9e-10     7.0 0.93 0.055 -1139  -210
## 8 0.56 0.77 0.018 268   409  5.9e-08     6.3 0.94 0.052 -1060  -210
##   complex eChisq  SRMR eCRMS  eBIC
## 1     1.0   6450 0.165 0.170  3907
## 2     1.3   3021 0.113 0.121   648
## 3     1.4   1517 0.080 0.089  -691
## 4     1.6    767 0.057 0.065 -1283
## 5     1.5    355 0.039 0.046 -1541
## 6     1.7    230 0.031 0.039 -1519
## 7     1.8    176 0.027 0.035 -1430
## 8     1.8    140 0.024 0.033 -1329

Try it yourself.

Conduct a factor analysis extracting a large number of factors (6) on the Sidanius data and store it. Discuss the output with your neigbour.

Run a parallel analysis. Discuss the outcome with your neighbour.

Back to the self-description data: Three- and five-factor solution

Let's extract one with three factors though 5 could also be workable.

fa_3<-fa(f_data,3, fm = 'minres', rotate='varimax', fa = 'fa')
sink('fa_3_output.txt')
fa_3
## Factor Analysis using method =  minres
## Call: fa(r = f_data, nfactors = 3, rotate = "varimax", fm = "minres", 
##     fa = "fa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           MR2   MR1   MR3   h2   u2 com
## distant -0.13 -0.55  0.27 0.39 0.61 1.6
## talkatv -0.05  0.79  0.04 0.63 0.37 1.0
## carelss -0.61 -0.01  0.23 0.42 0.58 1.3
## hardwrk  0.64  0.23  0.12 0.48 0.52 1.3
## anxious  0.05 -0.18  0.49 0.28 0.72 1.3
## agreebl  0.08  0.08 -0.40 0.17 0.83 1.1
## tense    0.11 -0.18  0.64 0.46 0.54 1.2
## kind     0.40  0.15 -0.34 0.30 0.70 2.2
## opposng -0.21  0.07  0.49 0.28 0.72 1.4
## relaxed -0.22  0.10 -0.64 0.47 0.53 1.3
## disorgn -0.71  0.02  0.06 0.50 0.50 1.0
## outgoin  0.10  0.84 -0.12 0.74 0.26 1.1
## approvn  0.18  0.31 -0.37 0.26 0.74 2.4
## shy     -0.14 -0.73  0.04 0.56 0.44 1.1
## discipl  0.60  0.00  0.06 0.37 0.63 1.0
## harsh   -0.24 -0.02  0.51 0.32 0.68 1.4
## persevr  0.55  0.20  0.06 0.34 0.66 1.3
## friendl  0.27  0.51 -0.28 0.41 0.59 2.1
## worryin  0.09 -0.21  0.52 0.32 0.68 1.4
## respnsi  0.73  0.05 -0.04 0.53 0.47 1.0
## contrar -0.24  0.01  0.58 0.39 0.61 1.3
## sociabl  0.02  0.72 -0.22 0.57 0.43 1.2
## lazy    -0.64 -0.19  0.10 0.46 0.54 1.2
## coopera  0.27  0.14 -0.48 0.32 0.68 1.8
## quiet   -0.04 -0.77  0.04 0.59 0.41 1.0
## organiz  0.73  0.08 -0.06 0.55 0.45 1.0
## criticl  0.07 -0.03  0.48 0.24 0.76 1.1
## lax     -0.48  0.00 -0.20 0.27 0.73 1.3
## laidbck -0.33  0.11 -0.46 0.33 0.67 2.0
## withdrw -0.17 -0.70  0.28 0.60 0.40 1.4
## givinup -0.43 -0.38  0.26 0.39 0.61 2.6
## easygon -0.16  0.20 -0.50 0.31 0.69 1.5
## 
##                        MR2  MR1  MR3
## SS loadings           4.63 4.62 4.00
## Proportion Var        0.14 0.14 0.13
## Cumulative Var        0.14 0.29 0.41
## Proportion Explained  0.35 0.35 0.30
## Cumulative Proportion 0.35 0.70 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  496  and the objective function was  17.62 with Chi Square of  4009.54
## The degrees of freedom for the model are 403  and the objective function was  6.18 
## 
## The root mean square of the residuals (RMSR) is  0.08 
## The df corrected root mean square of the residuals is  0.09 
## 
## The harmonic number of observations is  240 with the empirical chi square  1517.47  with prob <  1.1e-128 
## The total number of observations was  240  with Likelihood Chi Square =  1394.7  with prob <  2.3e-109 
## 
## Tucker Lewis Index of factoring reliability =  0.649
## RMSEA index =  0.106  and the 90 % confidence intervals are  0.096 0.107
## BIC =  -814
## Fit based upon off diagonal values = 0.91
## Measures of factor score adequacy             
##                                                    MR2  MR1  MR3
## Correlation of (regression) scores with factors   0.94 0.95 0.92
## Multiple R square of scores with factors          0.89 0.91 0.85
## Minimum correlation of possible factor scores     0.78 0.81 0.70
sink()
fa_5<-fa(f_data,5, fm = 'minres', rotate='varimax', fa = 'fa')
sink('fa_5_output.txt')
fa_5
## Factor Analysis using method =  minres
## Call: fa(r = f_data, nfactors = 5, rotate = "varimax", fm = "minres", 
##     fa = "fa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##           MR1   MR2   MR4   MR5   MR3   h2   u2 com
## distant  0.59  0.08  0.08 -0.11  0.30 0.46 0.54 1.7
## talkatv -0.76  0.03 -0.02  0.12  0.16 0.62 0.38 1.2
## carelss  0.02  0.60  0.09 -0.03  0.28 0.45 0.55 1.5
## hardwrk -0.20 -0.65  0.14  0.11  0.10 0.50 0.50 1.4
## anxious  0.17  0.08  0.70  0.16  0.22 0.59 0.41 1.5
## agreebl -0.02  0.02 -0.06  0.64 -0.17 0.44 0.56 1.2
## tense    0.16  0.01  0.77  0.01  0.26 0.70 0.30 1.3
## kind    -0.10 -0.31  0.03  0.61 -0.18 0.51 0.49 1.8
## opposng -0.01  0.11  0.09 -0.13  0.63 0.43 0.57 1.2
## relaxed -0.02  0.13 -0.69  0.34 -0.07 0.62 0.38 1.6
## disorgn -0.01  0.72  0.00  0.04  0.15 0.55 0.45 1.1
## outgoin -0.83 -0.08 -0.05  0.22  0.00 0.74 0.26 1.2
## approvn -0.27 -0.13 -0.12  0.51 -0.12 0.37 0.63 1.9
## shy      0.72  0.20  0.16  0.00 -0.09 0.58 0.42 1.3
## discipl  0.03 -0.63  0.04  0.09  0.09 0.42 0.58 1.1
## harsh    0.07  0.13  0.07 -0.22  0.64 0.48 0.52 1.4
## persevr -0.16 -0.54  0.11  0.19  0.09 0.38 0.62 1.6
## friendl -0.50 -0.17  0.06  0.55 -0.16 0.60 0.40 2.4
## worryin  0.18  0.05  0.73  0.05  0.13 0.59 0.41 1.2
## respnsi -0.01 -0.73  0.06  0.23  0.02 0.59 0.41 1.2
## contrar  0.05  0.14  0.15 -0.15  0.72 0.58 0.42 1.3
## sociabl -0.72  0.01 -0.09  0.26 -0.10 0.60 0.40 1.3
## lazy     0.19  0.68  0.07  0.03  0.13 0.51 0.49 1.3
## coopera -0.10 -0.18 -0.11  0.57 -0.28 0.46 0.54 1.9
## quiet    0.80  0.10  0.17  0.16  0.00 0.70 0.30 1.2
## organiz -0.06 -0.76 -0.02  0.06 -0.06 0.58 0.42 1.0
## criticl  0.09 -0.17  0.14 -0.11  0.58 0.40 0.60 1.4
## lax      0.04  0.47 -0.22  0.23  0.10 0.33 0.67 2.1
## laidbck -0.03  0.25 -0.59  0.28  0.09 0.50 0.50 1.9
## withdrw  0.73  0.14  0.13 -0.09  0.27 0.66 0.34 1.5
## givinup  0.36  0.46  0.22 -0.10  0.13 0.42 0.58 2.7
## easygon -0.13  0.13 -0.44  0.43 -0.02 0.42 0.58 2.4
## 
##                        MR1  MR2  MR4  MR5  MR3
## SS loadings           4.51 4.43 2.97 2.52 2.36
## Proportion Var        0.14 0.14 0.09 0.08 0.07
## Cumulative Var        0.14 0.28 0.37 0.45 0.52
## Proportion Explained  0.27 0.26 0.18 0.15 0.14
## Cumulative Proportion 0.27 0.53 0.71 0.86 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 5 factors are sufficient.
## 
## The degrees of freedom for the null model are  496  and the objective function was  17.62 with Chi Square of  4009.54
## The degrees of freedom for the model are 346  and the objective function was  3.17 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic number of observations is  240 with the empirical chi square  355.07  with prob <  0.36 
## The total number of observations was  240  with Likelihood Chi Square =  710.66  with prob <  2.2e-27 
## 
## Tucker Lewis Index of factoring reliability =  0.849
## RMSEA index =  0.071  and the 90 % confidence intervals are  0.059 0.073
## BIC =  -1185.64
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR4  MR5  MR3
## Correlation of (regression) scores with factors   0.96 0.94 0.93 0.90 0.89
## Multiple R square of scores with factors          0.91 0.89 0.86 0.81 0.79
## Minimum correlation of possible factor scores     0.83 0.78 0.72 0.62 0.58
sink()

Diagram.

fa.diagram(fa_5, marg=c(.01,.01,1,.01))

Factors.

How would you label those 5 factors?

require(semPlot)
semplot1<-semPlotModel(fa_5$loadings)
semPaths(semplot1, what="std", layout="circle", nCharNodes = 6)

Try it yourself.

Make a plot for your item scores.

Fit indices.

RMSEA and Tucker-Lewis Index (TLI).

A widely used cut-off for RMSEA is .06 (Hu & Bentler, 1999), others suggest .08 as acceptable. But beware of cut-offs. RMSEA is sample size dependent more so than TLI

For the Tucker-Lewis Index >.9 or >.95 is considered a good fit. Again beware of cut-offs.

Other indices also exist and we will discuss those when we move to SEM.

Back to the models.

The five factor model does better than the three factor model. But beware exploratory rather than confirmatory.

Sample description: While the five factor model could be considered a close fit in RMSEA (.071), it was not in terms of TLI (.849).

Extraction methods: choice paralysis.

There are many methods. Most of the time you'll get similar results

fm= minres factor analysis, principal axis factor analysis, weighted least squares factor analysis, generalized least squares factor analysis and maximum likelihood factor analysis. Minres and Principal Axis factoring are commonly used.

Principal Axis Factoring

require(psych)
parallel <- fa.parallel(f_data, fm = 'pa', fa = 'fa')

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

Extract loadings.

You can further beautify this by generating labels for those 32 items.

require(stargazer)
require(plyr)
factor_loadings<-as.data.frame(as.matrix.data.frame(fa_5$loadings))
factor_loadings<-plyr::rename(factor_loadings, c("V1"="Factor 1","V2"="Factor 2", "V3"="Factor 3", "V4"="Factor 4", "V5"="Factor 5"))
stargazer(factor_loadings, summary = FALSE,out= "results_loadings.html", header=FALSE, type="html")
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
1 0.587 0.083 0.075 -0.110 0.301
2 -0.763 0.029 -0.023 0.124 0.163
3 0.022 0.603 0.086 -0.027 0.275
4 -0.196 -0.645 0.139 0.113 0.101
5 0.169 0.080 0.698 0.156 0.217
6 -0.025 0.021 -0.057 0.640 -0.169
7 0.161 0.008 0.775 0.007 0.264
8 -0.103 -0.313 0.032 0.609 -0.184
9 -0.011 0.107 0.091 -0.133 0.630
10 -0.020 0.134 -0.692 0.340 -0.070
11 -0.014 0.723 -0.001 0.041 0.153
12 -0.825 -0.084 -0.055 0.218 0.001
13 -0.268 -0.128 -0.119 0.505 -0.119
14 0.715 0.195 0.161 -0.003 -0.091
15 0.033 -0.630 0.045 0.090 0.090
16 0.067 0.127 0.065 -0.221 0.640
17 -0.159 -0.542 0.112 0.193 0.091
18 -0.500 -0.166 0.061 0.546 -0.158
19 0.177 0.051 0.731 0.046 0.131
20 -0.007 -0.729 0.059 0.234 0.018
21 0.052 0.144 0.146 -0.148 0.719
22 -0.715 0.009 -0.087 0.259 -0.098
23 0.187 0.676 0.072 0.026 0.126
24 -0.102 -0.180 -0.105 0.569 -0.280
25 0.799 0.098 0.171 0.159 0.004
26 -0.060 -0.757 -0.016 0.058 -0.057
27 0.087 -0.169 0.144 -0.111 0.575
28 0.036 0.470 -0.218 0.230 0.101
29 -0.027 0.250 -0.590 0.283 0.088
30 0.734 0.144 0.128 -0.092 0.272
31 0.362 0.459 0.217 -0.103 0.132
32 -0.130 0.132 -0.444 0.430 -0.022

Plot loadings.

require(GPArotation)
plot(fa_5,labels=names(f_data),cex=.7, ylim=c(-.1,1)) 

Too busy and not very useful.

Just plot the first two factors, and those with loadings above .5. Label it.

factor.plot(fa_5, choose=c(1,2), cut=0.5, labels=colnames(f_data))

ggplot2.

Alternative graphs, see here

Extensions.

I just want Cronbach's Alpha… .

Find out how to do it here

Perhaps you should not rely on it too much?

Exercise

Load the BFI data from the 'psych' package (??bfi). This contains data on 2800 participants completing items relating to the 'big five' from the IPIP pool. You'll have to subset the variables for your factor analysis.

Conduct a Bartlett's test & KMO test.

Conduct an exploratory factor analysis (using 'minres' as method), using parallel analysis, discuss the scree plot, Very Simple Structure and Velicer map test.

Exercise (cont'd)

Extract a five factor model (use varimax rotation), export the factor loadings of these five factors. Discuss the RMSEA and TLI for that five factor model.

Make a plot for the factors.

References (and further reading.)