- Last lecture: Mediation
- Today: Exploratory Factor analysis
2018-11-20 | disclaimer
Factor analysis
Some ways of visualising factor analysis.
After today you should be able to complete the following sections for Assignment II:
Exploratory Factor Analysis and its assumptions.
Who has run an exploratory factor analysis?
What was the purpose?
We want to study the covariation between a large number of observed variables.
How many latent factors would account for most of the variation among the observed variables?
Which variables appear to define each factor. What labels could we give to these factors? If the observed covariation can be explained by a small number of factors (e.g., 2-5), this would increase our understanding of the relationships among the variables!
–> Reduce complexity and increase understanding.
–> validate scale (–> ultimately confirmatory factor analysis).
Exploratory vs. confirmatory. Occasionally you will have a very clear idea has to how many factors there should be. In such a case one would usually do confirmatory analysis.
Principal components vs. Factor analysis.
"The idea of principal components analysis (PCA) is to find a small number of linear combinations of the variables so as to capture most of the variation in the dataframe as a whole. … Principal components analysis finds a set of orthogonal standardized linear combinations which together explain all of the variation in the original data. There are as many principal components as there are variables, but typically it is only the first few of them that explain important amounts of the total variation." Crawley (2013:809-810)
"With principal components analysis we were fundamentally interested in the variables and their contributions. Factor analysis aims to provide usable numerical values for quantities such as intelligence or social status that are not directly measurable. The idea is to use correlations between observable variables in terms of underlying ‘factors’." Crawley (2013:813)
Note that factors here means something fundamentally different than factors when we were describing a single variable.
Also some researchers will use the terms interchangeably (even though they are separate techniques).
Today we will mostly deal with factor analysis (should you require principal component analysis, have a look here and here)
The mathematics are also described in those sources.
In essence you can think of factor analysis as OLS regression, which means similar assumptions apply.
Measurement: All variables should be interval. No dummy variables. No outliers.
Sample size: >200 , although some advocate 5-10 per variable but see this on rules of thumb.
Multivariate normality: Though not necessarily required for exploratory factor analysis, useful to check.
Linear: the proposed relationships are linear.
Factorability: There should be some correlations which can be meaningfully grouped together.
More on assumptions here
Data are from here. 240 participants providing self-ratings (1-9) on 32 variables.
setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture7") f_data <- read.table("personality0.txt") require(stargazer) stargazer(f_data, type = "html", out= "factor_data.html")
Measurement and sample sizes OK. (Though 1 to 9, one can always question how 'interval' that really is. 1 to 7 would be worse and that is commonly used)
As an aside. This is not looking great… . We will ignore it for now, given that we are conducting exploratory factor analysis.
require(MVN) mvn(f_data)
## $multivariateNormality ## Test Statistic p value Result ## 1 Mardia Skewness 8120.19888493781 1.26964854513512e-69 NO ## 2 Mardia Kurtosis 15.3220135668768 0 NO ## 3 MVN <NA> <NA> NO ## ## $univariateNormality ## Test Variable Statistic p value Normality ## 1 Shapiro-Wilk distant 0.9259 <0.001 NO ## 2 Shapiro-Wilk talkatv 0.9530 <0.001 NO ## 3 Shapiro-Wilk carelss 0.9193 <0.001 NO ## 4 Shapiro-Wilk hardwrk 0.9102 <0.001 NO ## 5 Shapiro-Wilk anxious 0.9571 <0.001 NO ## 6 Shapiro-Wilk agreebl 0.9200 <0.001 NO ## 7 Shapiro-Wilk tense 0.9521 <0.001 NO ## 8 Shapiro-Wilk kind 0.9157 <0.001 NO ## 9 Shapiro-Wilk opposng 0.9435 <0.001 NO ## 10 Shapiro-Wilk relaxed 0.9644 <0.001 NO ## 11 Shapiro-Wilk disorgn 0.9360 <0.001 NO ## 12 Shapiro-Wilk outgoin 0.9441 <0.001 NO ## 13 Shapiro-Wilk approvn 0.9481 <0.001 NO ## 14 Shapiro-Wilk shy 0.9566 <0.001 NO ## 15 Shapiro-Wilk discipl 0.9411 <0.001 NO ## 16 Shapiro-Wilk harsh 0.9276 <0.001 NO ## 17 Shapiro-Wilk persevr 0.9255 <0.001 NO ## 18 Shapiro-Wilk friendl 0.9039 <0.001 NO ## 19 Shapiro-Wilk worryin 0.9406 <0.001 NO ## 20 Shapiro-Wilk respnsi 0.8741 <0.001 NO ## 21 Shapiro-Wilk contrar 0.9523 <0.001 NO ## 22 Shapiro-Wilk sociabl 0.9300 <0.001 NO ## 23 Shapiro-Wilk lazy 0.9582 <0.001 NO ## 24 Shapiro-Wilk coopera 0.9277 <0.001 NO ## 25 Shapiro-Wilk quiet 0.9602 <0.001 NO ## 26 Shapiro-Wilk organiz 0.9457 <0.001 NO ## 27 Shapiro-Wilk criticl 0.9571 <0.001 NO ## 28 Shapiro-Wilk lax 0.9498 <0.001 NO ## 29 Shapiro-Wilk laidbck 0.9640 <0.001 NO ## 30 Shapiro-Wilk withdrw 0.9266 <0.001 NO ## 31 Shapiro-Wilk givinup 0.8737 <0.001 NO ## 32 Shapiro-Wilk easygon 0.9473 <0.001 NO ## ## $Descriptives ## n Mean Std.Dev Median Min Max 25th 75th Skew ## distant 240 3.866667 1.794615 3 1 8 2.00 5 0.21660329 ## talkatv 240 5.883333 1.677732 6 2 9 5.00 7 -0.18189899 ## carelss 240 3.412500 1.811357 3 1 9 2.00 5 0.66867120 ## hardwrk 240 6.925000 1.370108 7 2 9 6.00 8 -0.80831996 ## anxious 240 5.129167 1.880305 5 1 9 4.00 7 -0.09210485 ## agreebl 240 6.629167 1.372162 7 1 9 6.00 8 -0.78725929 ## tense 240 4.616667 1.904337 5 1 9 3.00 6 -0.02518179 ## kind 240 6.970833 1.262255 7 2 9 6.00 8 -0.70373663 ## opposng 240 3.858333 1.599141 4 1 8 3.00 5 0.46913404 ## relaxed 240 5.475000 1.694009 5 1 9 4.00 7 -0.08833270 ## disorgn 240 4.083333 2.126082 4 1 9 2.00 6 0.16985907 ## outgoin 240 6.020833 1.809894 6 2 9 5.00 7 -0.34291775 ## approvn 240 5.858333 1.367867 6 2 9 5.00 7 -0.13555584 ## shy 240 4.558333 1.969626 5 1 9 3.00 6 0.06063502 ## discipl 240 6.308333 1.725011 7 1 9 5.00 7 -0.56730380 ## harsh 240 3.600000 1.683789 3 1 8 2.00 5 0.45079493 ## persevr 240 6.804167 1.405006 7 2 9 6.00 8 -0.62497986 ## friendl 240 7.250000 1.155304 7 2 9 7.00 8 -0.59175740 ## worryin 240 5.212500 2.108126 6 1 9 3.00 7 -0.07134918 ## respnsi 240 7.291667 1.395725 8 1 9 7.00 8 -1.21576315 ## contrar 240 3.770833 1.500900 4 1 8 3.00 5 0.22186635 ## sociabl 240 6.445833 1.567579 7 2 9 5.00 8 -0.64011468 ## lazy 240 4.179167 1.893941 4 1 9 3.00 5 0.20658153 ## coopera 240 6.695833 1.197619 7 3 9 6.00 7 -0.37705887 ## quiet 240 4.604167 1.880750 5 1 9 3.00 6 0.15018658 ## organiz 240 6.154167 1.963363 6 1 9 5.00 8 -0.45913660 ## criticl 240 5.170833 1.745282 5 1 9 4.00 6 -0.21890441 ## lax 240 4.083333 1.664713 4 1 9 3.00 5 0.41571022 ## laidbck 240 5.245833 1.790837 5 1 9 4.00 7 -0.13048078 ## withdrw 240 3.754167 1.769684 3 1 7 2.00 5 0.17034350 ## givinup 240 2.675000 1.553307 2 1 8 1.75 4 1.00112342 ## easygon 240 6.066667 1.601429 6 2 9 5.00 7 -0.41865452 ## Kurtosis ## distant -1.115187889 ## talkatv -0.748747155 ## carelss -0.309281507 ## hardwrk 0.614831474 ## anxious -0.771926581 ## agreebl 1.042140998 ## tense -0.925252081 ## kind 0.767591124 ## opposng -0.293884606 ## relaxed -0.507015321 ## disorgn -1.065496081 ## outgoin -0.761161418 ## approvn -0.154427814 ## shy -0.919119555 ## discipl 0.145322024 ## harsh -0.736948487 ## persevr 0.642059811 ## friendl 1.210961241 ## worryin -1.139934280 ## respnsi 2.252882721 ## contrar -0.472970716 ## sociabl 0.061296675 ## lazy -0.597716553 ## coopera 0.004767416 ## quiet -0.728162291 ## organiz -0.438841538 ## criticl -0.556012334 ## lax -0.088209525 ## laidbck -0.522778720 ## withdrw -1.142037389 ## givinup 0.564770534 ## easygon -0.324220228
## $multivariateNormality ## Test Statistic p value Result ## 1 Mardia Skewness 8120.19888493781 1.26964854513512e-69 NO ## 2 Mardia Kurtosis 15.3220135668768 0 NO ## 3 MVN <NA> <NA> NO ## ## $univariateNormality ## Test Variable Statistic p value Normality ## 1 Shapiro-Wilk distant 0.9259 <0.001 NO ## 2 Shapiro-Wilk talkatv 0.9530 <0.001 NO ## 3 Shapiro-Wilk carelss 0.9193 <0.001 NO ## 4 Shapiro-Wilk hardwrk 0.9102 <0.001 NO ## 5 Shapiro-Wilk anxious 0.9571 <0.001 NO ## 6 Shapiro-Wilk agreebl 0.9200 <0.001 NO ## 7 Shapiro-Wilk tense 0.9521 <0.001 NO ## 8 Shapiro-Wilk kind 0.9157 <0.001 NO ## 9 Shapiro-Wilk opposng 0.9435 <0.001 NO ## 10 Shapiro-Wilk relaxed 0.9644 <0.001 NO ## 11 Shapiro-Wilk disorgn 0.9360 <0.001 NO ## 12 Shapiro-Wilk outgoin 0.9441 <0.001 NO ## 13 Shapiro-Wilk approvn 0.9481 <0.001 NO ## 14 Shapiro-Wilk shy 0.9566 <0.001 NO ## 15 Shapiro-Wilk discipl 0.9411 <0.001 NO ## 16 Shapiro-Wilk harsh 0.9276 <0.001 NO ## 17 Shapiro-Wilk persevr 0.9255 <0.001 NO ## 18 Shapiro-Wilk friendl 0.9039 <0.001 NO ## 19 Shapiro-Wilk worryin 0.9406 <0.001 NO ## 20 Shapiro-Wilk respnsi 0.8741 <0.001 NO ## 21 Shapiro-Wilk contrar 0.9523 <0.001 NO ## 22 Shapiro-Wilk sociabl 0.9300 <0.001 NO ## 23 Shapiro-Wilk lazy 0.9582 <0.001 NO ## 24 Shapiro-Wilk coopera 0.9277 <0.001 NO ## 25 Shapiro-Wilk quiet 0.9602 <0.001 NO ## 26 Shapiro-Wilk organiz 0.9457 <0.001 NO ## 27 Shapiro-Wilk criticl 0.9571 <0.001 NO ## 28 Shapiro-Wilk lax 0.9498 <0.001 NO ## 29 Shapiro-Wilk laidbck 0.9640 <0.001 NO ## 30 Shapiro-Wilk withdrw 0.9266 <0.001 NO ## 31 Shapiro-Wilk givinup 0.8737 <0.001 NO ## 32 Shapiro-Wilk easygon 0.9473 <0.001 NO ## ## $Descriptives ## n Mean Std.Dev Median Min Max 25th 75th Skew ## distant 240 3.866667 1.794615 3 1 8 2.00 5 0.21660329 ## talkatv 240 5.883333 1.677732 6 2 9 5.00 7 -0.18189899 ## carelss 240 3.412500 1.811357 3 1 9 2.00 5 0.66867120 ## hardwrk 240 6.925000 1.370108 7 2 9 6.00 8 -0.80831996 ## anxious 240 5.129167 1.880305 5 1 9 4.00 7 -0.09210485 ## agreebl 240 6.629167 1.372162 7 1 9 6.00 8 -0.78725929 ## tense 240 4.616667 1.904337 5 1 9 3.00 6 -0.02518179 ## kind 240 6.970833 1.262255 7 2 9 6.00 8 -0.70373663 ## opposng 240 3.858333 1.599141 4 1 8 3.00 5 0.46913404 ## relaxed 240 5.475000 1.694009 5 1 9 4.00 7 -0.08833270 ## disorgn 240 4.083333 2.126082 4 1 9 2.00 6 0.16985907 ## outgoin 240 6.020833 1.809894 6 2 9 5.00 7 -0.34291775 ## approvn 240 5.858333 1.367867 6 2 9 5.00 7 -0.13555584 ## shy 240 4.558333 1.969626 5 1 9 3.00 6 0.06063502 ## discipl 240 6.308333 1.725011 7 1 9 5.00 7 -0.56730380 ## harsh 240 3.600000 1.683789 3 1 8 2.00 5 0.45079493 ## persevr 240 6.804167 1.405006 7 2 9 6.00 8 -0.62497986 ## friendl 240 7.250000 1.155304 7 2 9 7.00 8 -0.59175740 ## worryin 240 5.212500 2.108126 6 1 9 3.00 7 -0.07134918 ## respnsi 240 7.291667 1.395725 8 1 9 7.00 8 -1.21576315 ## contrar 240 3.770833 1.500900 4 1 8 3.00 5 0.22186635 ## sociabl 240 6.445833 1.567579 7 2 9 5.00 8 -0.64011468 ## lazy 240 4.179167 1.893941 4 1 9 3.00 5 0.20658153 ## coopera 240 6.695833 1.197619 7 3 9 6.00 7 -0.37705887 ## quiet 240 4.604167 1.880750 5 1 9 3.00 6 0.15018658 ## organiz 240 6.154167 1.963363 6 1 9 5.00 8 -0.45913660 ## criticl 240 5.170833 1.745282 5 1 9 4.00 6 -0.21890441 ## lax 240 4.083333 1.664713 4 1 9 3.00 5 0.41571022 ## laidbck 240 5.245833 1.790837 5 1 9 4.00 7 -0.13048078 ## withdrw 240 3.754167 1.769684 3 1 7 2.00 5 0.17034350 ## givinup 240 2.675000 1.553307 2 1 8 1.75 4 1.00112342 ## easygon 240 6.066667 1.601429 6 2 9 5.00 7 -0.41865452 ## Kurtosis ## distant -1.115187889 ## talkatv -0.748747155 ## carelss -0.309281507 ## hardwrk 0.614831474 ## anxious -0.771926581 ## agreebl 1.042140998 ## tense -0.925252081 ## kind 0.767591124 ## opposng -0.293884606 ## relaxed -0.507015321 ## disorgn -1.065496081 ## outgoin -0.761161418 ## approvn -0.154427814 ## shy -0.919119555 ## discipl 0.145322024 ## harsh -0.736948487 ## persevr 0.642059811 ## friendl 1.210961241 ## worryin -1.139934280 ## respnsi 2.252882721 ## contrar -0.472970716 ## sociabl 0.061296675 ## lazy -0.597716553 ## coopera 0.004767416 ## quiet -0.728162291 ## organiz -0.438841538 ## criticl -0.556012334 ## lax -0.088209525 ## laidbck -0.522778720 ## withdrw -1.142037389 ## givinup 0.564770534 ## easygon -0.324220228
You can do pairwise scatterplots but with range 1-9 this is not wholly useful. We will just assume linearity will do.
require(ggplot2) require(GGally) ggpairs(f_data[,1:4]) # example
Here we want Bartlett's test to be significant! Why?
bartlett.test(f_data)
## ## Bartlett test of homogeneity of variances ## ## data: f_data ## Bartlett's K-squared = 350.08, df = 31, p-value < 2.2e-16
Bartlett's test for sphericity was significant suggesting that factor analysis is appropriate (\(\chi^2\)(31) = 350.1, p < .0001).
Kaiser-Meyer-Olkin factor adequacy ranges from 0 to 1. All should be >.5 (Kaiser, 1977)
require(psych) KMO(f_data)
## Kaiser-Meyer-Olkin factor adequacy ## Call: KMO(r = f_data) ## Overall MSA = 0.84 ## MSA for each item = ## distant talkatv carelss hardwrk anxious agreebl tense kind opposng ## 0.88 0.86 0.82 0.87 0.82 0.73 0.84 0.81 0.79 ## relaxed disorgn outgoin approvn shy discipl harsh persevr friendl ## 0.86 0.75 0.87 0.89 0.87 0.84 0.85 0.86 0.87 ## worryin respnsi contrar sociabl lazy coopera quiet organiz criticl ## 0.81 0.86 0.83 0.90 0.89 0.83 0.87 0.78 0.87 ## lax laidbck withdrw givinup easygon ## 0.81 0.73 0.90 0.89 0.78
0.00 to 0.49 unacceptable.
0.50 to 0.59 miserable.
0.60 to 0.69 mediocre.
0.70 to 0.79 middling.
0.80 to 0.89 meritorious.
0.90 to 1.00 marvelous.
All 32 items showed middling to meritorious adequacy for factor analysis (all MSA\(\geq\).72).
Download the data from here. Data are ratings of instructors from a study by Sidanius.
Read in the data, make a subset with items 13:24 (Hint: use the select function from dplyr and num_range) and conduct a KMO test.
# ggcorrplot, you can then further tweak this, as it is a ggplot. require(ggcorrplot) require(dplyr) # take the absolute, interested in strength. cormatrix<-abs(cor(f_data[,1:20])) corplot<-ggcorrplot(cormatrix, hc.order = TRUE, type = "lower", method = "circle")
See here
Factor: An underlying or latent construct causing the observed variables, to a greater or lesser extent. A factor is estimated by a linear combination of out observed variables. When the ‘best fitting’ factors are found, it should be remembered that these factors are not unique! It can be shown that any rotation of the best-fitting factors is also best-fitting. ‘Interpretability’ is used to select the ‘best’ rotation among the equally ‘good’ rotations: To be useful, factors should be interpretable. Rotation of factors is used to improve the interpretability of factors. So once we have extracted the factors, we will rotate them.
Factor loadings: The degree to which the variable is driven or ‘caused’ by the factor.
Factor score/weights: These can be estimated for each factor. This can then be added to your dataframe. Basically a score for your participant on each factor. Occasionally, one would then use those scores in further analyses.
Communality of a variable: The extent to which the variability across participants in a variable is ‘explained’ by the set of factors extracted in the factor analysis. Uniqueness = 1-Communality.
Varimax rotation. Let's start by getting 8 factors. (Request large number and then trim down!)
require(psych) fa <- fa(f_data,8, fm = 'minres', rotate='varimax', fa = 'fa')
sink('fa_output.text') fa
## Factor Analysis using method = minres ## Call: fa(r = f_data, nfactors = 8, rotate = "varimax", fm = "minres", ## fa = "fa") ## Standardized loadings (pattern matrix) based upon correlation matrix ## MR1 MR2 MR4 MR8 MR6 MR5 MR7 MR3 h2 u2 com ## distant 0.60 0.02 0.11 -0.13 0.13 0.24 0.04 0.10 0.48 0.52 1.7 ## talkatv -0.76 0.06 -0.01 0.09 0.11 0.14 0.04 0.07 0.63 0.37 1.2 ## carelss 0.05 -0.25 0.12 -0.08 0.63 0.19 0.08 0.14 0.54 0.46 1.9 ## hardwrk -0.18 0.70 0.14 0.06 -0.21 0.01 -0.04 0.07 0.59 0.41 1.5 ## anxious 0.16 0.02 0.75 0.07 0.09 0.15 -0.09 0.06 0.63 0.37 1.3 ## agreebl -0.03 0.02 -0.01 0.71 0.08 -0.17 0.13 0.15 0.58 0.42 1.3 ## tense 0.15 0.06 0.77 -0.04 0.04 0.21 -0.21 0.09 0.72 0.28 1.5 ## kind -0.12 0.19 0.04 0.64 -0.22 -0.12 0.04 -0.21 0.56 0.44 1.9 ## opposng -0.01 -0.07 0.09 -0.14 0.09 0.68 0.00 -0.10 0.51 0.49 1.2 ## relaxed -0.03 -0.09 -0.52 0.25 0.05 -0.09 0.49 -0.10 0.61 0.39 2.7 ## disorgn 0.02 -0.31 0.02 -0.01 0.82 0.08 0.10 -0.05 0.79 0.21 1.3 ## outgoin -0.84 0.12 0.00 0.15 -0.02 -0.03 0.12 0.01 0.75 0.25 1.1 ## approvn -0.29 0.13 -0.04 0.48 -0.07 -0.14 0.19 0.11 0.40 0.60 2.6 ## shy 0.71 -0.22 0.17 0.00 0.03 -0.07 -0.01 -0.07 0.59 0.41 1.4 ## discipl 0.05 0.69 0.04 0.08 -0.19 0.00 -0.03 0.16 0.55 0.45 1.3 ## harsh 0.08 -0.04 0.05 -0.18 0.15 0.64 -0.05 0.26 0.55 0.45 1.7 ## persevr -0.14 0.64 0.09 0.14 -0.08 0.04 -0.04 -0.18 0.50 0.50 1.5 ## friendl -0.52 0.16 0.10 0.48 -0.06 -0.15 0.11 -0.27 0.64 0.36 3.2 ## worryin 0.16 -0.03 0.77 -0.04 0.00 0.10 -0.15 -0.20 0.69 0.31 1.4 ## respnsi -0.01 0.56 -0.01 0.29 -0.40 0.06 -0.13 -0.16 0.60 0.40 2.8 ## contrar 0.06 -0.08 0.17 -0.16 0.12 0.72 0.00 0.06 0.59 0.41 1.3 ## sociabl -0.74 -0.06 -0.04 0.22 -0.06 -0.07 0.11 -0.06 0.63 0.37 1.3 ## lazy 0.16 -0.63 0.16 -0.02 0.26 0.15 0.16 -0.03 0.57 0.43 1.9 ## coopera -0.11 0.13 -0.12 0.63 -0.06 -0.24 0.04 -0.07 0.51 0.49 1.6 ## quiet 0.79 -0.14 0.21 0.16 -0.03 0.02 0.03 -0.08 0.71 0.29 1.3 ## organiz -0.10 0.40 0.00 0.09 -0.76 0.00 -0.03 0.04 0.76 0.24 1.6 ## criticl 0.09 0.12 0.13 -0.11 -0.12 0.62 -0.05 -0.10 0.46 0.54 1.4 ## lax 0.02 -0.33 -0.04 0.10 0.25 0.04 0.39 0.01 0.34 0.66 2.9 ## laidbck -0.05 -0.07 -0.28 0.03 0.11 -0.04 0.84 0.05 0.80 0.20 1.3 ## withdrw 0.74 -0.06 0.17 -0.11 0.12 0.22 0.04 0.13 0.67 0.33 1.5 ## givinup 0.34 -0.46 0.28 -0.10 0.12 0.13 0.02 0.22 0.50 0.50 3.6 ## easygon -0.16 -0.11 -0.23 0.30 -0.01 -0.06 0.51 -0.03 0.45 0.55 2.5 ## ## MR1 MR2 MR4 MR8 MR6 MR5 MR7 MR3 ## SS loadings 4.55 2.96 2.50 2.29 2.24 2.23 1.59 0.52 ## Proportion Var 0.14 0.09 0.08 0.07 0.07 0.07 0.05 0.02 ## Cumulative Var 0.14 0.23 0.31 0.38 0.45 0.52 0.57 0.59 ## Proportion Explained 0.24 0.16 0.13 0.12 0.12 0.12 0.08 0.03 ## Cumulative Proportion 0.24 0.40 0.53 0.65 0.77 0.89 0.97 1.00 ## ## Mean item complexity = 1.8 ## Test of the hypothesis that 8 factors are sufficient. ## ## The degrees of freedom for the null model are 496 and the objective function was 17.62 with Chi Square of 4009.54 ## The degrees of freedom for the model are 268 and the objective function was 1.84 ## ## The root mean square of the residuals (RMSR) is 0.02 ## The df corrected root mean square of the residuals is 0.03 ## ## The harmonic number of observations is 240 with the empirical chi square 140.27 with prob < 1 ## The total number of observations was 240 with Likelihood Chi Square = 409.11 with prob < 5.9e-08 ## ## Tucker Lewis Index of factoring reliability = 0.924 ## RMSEA index = 0.052 and the 90 % confidence intervals are 0.038 0.056 ## BIC = -1059.7 ## Fit based upon off diagonal values = 0.99 ## Measures of factor score adequacy ## MR1 MR2 MR4 MR8 MR6 ## Correlation of (regression) scores with factors 0.96 0.89 0.91 0.88 0.90 ## Multiple R square of scores with factors 0.92 0.79 0.83 0.77 0.82 ## Minimum correlation of possible factor scores 0.83 0.59 0.66 0.55 0.64 ## MR5 MR7 MR3 ## Correlation of (regression) scores with factors 0.88 0.89 0.73 ## Multiple R square of scores with factors 0.78 0.79 0.53 ## Minimum correlation of possible factor scores 0.56 0.58 0.06
sink()
We need to determine the number of factors. Many options! From Revelle (2017:37):
Extracting factors until the chi square of the residual matrix is not significant.
Extracting factors until the change in chi square from factor n to factor n+1 is not significant.
Extracting factors until the eigen values of the real data are less than the corresponding eigen values of a random data set of the same size (parallel analysis) fa.parallel (Horn, 1965).
Plotting the magnitude of the successive eigen values and applying the scree test (a sudden drop in eigen values analogous to the change in slope seen when scrambling up the talus slope of a mountain and approaching the rock face (Cattell, 1966).
5) Extracting factors as long as they are interpretable.
6) Using the Very Simple Structure Criterion (vss) (Revelle and Rocklin, 1979).
7) Using Wayne Velicer’s Minimum Average Partial (MAP) criterion (Velicer, 1976).
8) Extracting principal components until the eigen value < 1 (Kaiser criterion).
Each has advantages and disadvantages. 8) although common is probably the worst.
Read more here
Also part of the output already. Here I used 'minres' as extraction method. Parallel analysis suggests 5 factors (compare red line to blue triangle)
require(psych) parallel <- fa.parallel(f_data, fm = 'minres', fa = 'fa')
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
parallel
## Call: fa.parallel(x = f_data, fm = "minres", fa = "fa") ## Parallel analysis suggests that the number of factors = 5 and the number of components = NA ## ## Eigen Values of ## ## eigen values of factors ## [1] 6.52 3.66 2.35 1.53 1.02 0.38 0.14 0.02 0.01 -0.02 -0.09 ## [12] -0.16 -0.18 -0.21 -0.25 -0.28 -0.33 -0.34 -0.37 -0.39 -0.40 -0.43 ## [23] -0.45 -0.48 -0.49 -0.52 -0.55 -0.58 -0.60 -0.62 -0.67 -0.70 ## ## eigen values of simulated factors ## [1] 0.83 0.67 0.59 0.53 0.46 0.41 0.36 0.32 0.27 0.23 0.19 ## [12] 0.15 0.11 0.08 0.04 0.00 -0.03 -0.06 -0.10 -0.13 -0.16 -0.20 ## [23] -0.23 -0.26 -0.29 -0.32 -0.35 -0.38 -0.42 -0.46 -0.50 -0.54 ## ## eigen values of components ## [1] 7.24 4.53 3.12 2.33 1.88 1.19 0.93 0.86 0.80 0.71 0.69 0.64 0.63 0.54 ## [15] 0.51 0.47 0.46 0.45 0.44 0.41 0.39 0.37 0.33 0.30 0.29 0.28 0.25 0.23 ## [29] 0.22 0.20 0.17 0.13 ## ## eigen values of simulated components ## [1] NA
Kaiser criterion is the number of eigenvalues >1.
This can be seen on the graph. That would suggest 5 factors.
parallel$fa.values
## [1] 6.518632220 3.663879765 2.352072662 1.529192200 1.019134280 ## [6] 0.375145536 0.143123374 0.017903052 0.008116657 -0.018428192 ## [11] -0.092557391 -0.157077647 -0.179716109 -0.214661107 -0.246564997 ## [16] -0.279092818 -0.331472781 -0.341440377 -0.366177997 -0.388743904 ## [21] -0.397539896 -0.425290978 -0.449044173 -0.483146864 -0.488668863 ## [26] -0.516074439 -0.548458000 -0.584031532 -0.602210864 -0.619056368 ## [31] -0.674287746 -0.704824040
Depending on the elbow of the graph (Scree Criterion) you would extract 4 or 5.
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
The VSS plot suggests 3 factors. Very little improvement with 4. MAP suggests 6 factors. Output printed to console, check here
require(psych) VSS(f_data, rotate= "varimax", n.obs= 240)# shows plot
## ## Very Simple Structure ## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, ## n.obs = n.obs, plot = plot, title = title, use = use, cor = cor) ## Although the VSS complexity 1 shows 5 factors, it is probably more reasonable to think about 3 factors ## VSS complexity 2 achieves a maximimum of 0.81 with 5 factors ## ## The Velicer MAP achieves a minimum of 0.02 with 6 factors ## BIC achieves a minimum of -1194.35 with 6 factors ## Sample Size adjusted BIC achieves a minimum of -210.21 with 8 factors ## ## Statistics by number of factors ## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC ## 1 0.52 0.00 0.045 464 2764 1.2e-322 47.7 0.52 0.149 221 1692 ## 2 0.60 0.72 0.033 433 1992 5.7e-198 27.5 0.72 0.127 -381 992 ## 3 0.61 0.78 0.024 403 1395 2.3e-109 17.9 0.82 0.106 -814 463 ## 4 0.61 0.80 0.018 374 967 3.9e-54 12.5 0.87 0.086 -1082 103 ## 5 0.62 0.81 0.015 346 711 2.2e-27 9.1 0.91 0.071 -1186 -89 ## 6 0.56 0.79 0.015 319 554 6.8e-15 7.7 0.92 0.060 -1194 -183 ## 7 0.54 0.77 0.016 293 467 3.9e-10 7.0 0.93 0.055 -1139 -210 ## 8 0.56 0.77 0.018 268 409 5.9e-08 6.3 0.94 0.052 -1060 -210 ## complex eChisq SRMR eCRMS eBIC ## 1 1.0 6450 0.165 0.170 3907 ## 2 1.3 3021 0.113 0.121 648 ## 3 1.4 1517 0.080 0.089 -691 ## 4 1.6 767 0.057 0.065 -1283 ## 5 1.5 355 0.039 0.046 -1541 ## 6 1.7 230 0.031 0.039 -1519 ## 7 1.8 176 0.027 0.035 -1430 ## 8 1.8 140 0.024 0.033 -1329
Conduct a factor analysis extracting a large number of factors (6) on the Sidanius data and store it. Discuss the output with your neigbour.
Run a parallel analysis. Discuss the outcome with your neighbour.
Let's extract one with three factors though 5 could also be workable.
fa_3<-fa(f_data,3, fm = 'minres', rotate='varimax', fa = 'fa') sink('fa_3_output.txt') fa_3
## Factor Analysis using method = minres ## Call: fa(r = f_data, nfactors = 3, rotate = "varimax", fm = "minres", ## fa = "fa") ## Standardized loadings (pattern matrix) based upon correlation matrix ## MR2 MR1 MR3 h2 u2 com ## distant -0.13 -0.55 0.27 0.39 0.61 1.6 ## talkatv -0.05 0.79 0.04 0.63 0.37 1.0 ## carelss -0.61 -0.01 0.23 0.42 0.58 1.3 ## hardwrk 0.64 0.23 0.12 0.48 0.52 1.3 ## anxious 0.05 -0.18 0.49 0.28 0.72 1.3 ## agreebl 0.08 0.08 -0.40 0.17 0.83 1.1 ## tense 0.11 -0.18 0.64 0.46 0.54 1.2 ## kind 0.40 0.15 -0.34 0.30 0.70 2.2 ## opposng -0.21 0.07 0.49 0.28 0.72 1.4 ## relaxed -0.22 0.10 -0.64 0.47 0.53 1.3 ## disorgn -0.71 0.02 0.06 0.50 0.50 1.0 ## outgoin 0.10 0.84 -0.12 0.74 0.26 1.1 ## approvn 0.18 0.31 -0.37 0.26 0.74 2.4 ## shy -0.14 -0.73 0.04 0.56 0.44 1.1 ## discipl 0.60 0.00 0.06 0.37 0.63 1.0 ## harsh -0.24 -0.02 0.51 0.32 0.68 1.4 ## persevr 0.55 0.20 0.06 0.34 0.66 1.3 ## friendl 0.27 0.51 -0.28 0.41 0.59 2.1 ## worryin 0.09 -0.21 0.52 0.32 0.68 1.4 ## respnsi 0.73 0.05 -0.04 0.53 0.47 1.0 ## contrar -0.24 0.01 0.58 0.39 0.61 1.3 ## sociabl 0.02 0.72 -0.22 0.57 0.43 1.2 ## lazy -0.64 -0.19 0.10 0.46 0.54 1.2 ## coopera 0.27 0.14 -0.48 0.32 0.68 1.8 ## quiet -0.04 -0.77 0.04 0.59 0.41 1.0 ## organiz 0.73 0.08 -0.06 0.55 0.45 1.0 ## criticl 0.07 -0.03 0.48 0.24 0.76 1.1 ## lax -0.48 0.00 -0.20 0.27 0.73 1.3 ## laidbck -0.33 0.11 -0.46 0.33 0.67 2.0 ## withdrw -0.17 -0.70 0.28 0.60 0.40 1.4 ## givinup -0.43 -0.38 0.26 0.39 0.61 2.6 ## easygon -0.16 0.20 -0.50 0.31 0.69 1.5 ## ## MR2 MR1 MR3 ## SS loadings 4.63 4.62 4.00 ## Proportion Var 0.14 0.14 0.13 ## Cumulative Var 0.14 0.29 0.41 ## Proportion Explained 0.35 0.35 0.30 ## Cumulative Proportion 0.35 0.70 1.00 ## ## Mean item complexity = 1.4 ## Test of the hypothesis that 3 factors are sufficient. ## ## The degrees of freedom for the null model are 496 and the objective function was 17.62 with Chi Square of 4009.54 ## The degrees of freedom for the model are 403 and the objective function was 6.18 ## ## The root mean square of the residuals (RMSR) is 0.08 ## The df corrected root mean square of the residuals is 0.09 ## ## The harmonic number of observations is 240 with the empirical chi square 1517.47 with prob < 1.1e-128 ## The total number of observations was 240 with Likelihood Chi Square = 1394.7 with prob < 2.3e-109 ## ## Tucker Lewis Index of factoring reliability = 0.649 ## RMSEA index = 0.106 and the 90 % confidence intervals are 0.096 0.107 ## BIC = -814 ## Fit based upon off diagonal values = 0.91 ## Measures of factor score adequacy ## MR2 MR1 MR3 ## Correlation of (regression) scores with factors 0.94 0.95 0.92 ## Multiple R square of scores with factors 0.89 0.91 0.85 ## Minimum correlation of possible factor scores 0.78 0.81 0.70
sink() fa_5<-fa(f_data,5, fm = 'minres', rotate='varimax', fa = 'fa') sink('fa_5_output.txt') fa_5
## Factor Analysis using method = minres ## Call: fa(r = f_data, nfactors = 5, rotate = "varimax", fm = "minres", ## fa = "fa") ## Standardized loadings (pattern matrix) based upon correlation matrix ## MR1 MR2 MR4 MR5 MR3 h2 u2 com ## distant 0.59 0.08 0.08 -0.11 0.30 0.46 0.54 1.7 ## talkatv -0.76 0.03 -0.02 0.12 0.16 0.62 0.38 1.2 ## carelss 0.02 0.60 0.09 -0.03 0.28 0.45 0.55 1.5 ## hardwrk -0.20 -0.65 0.14 0.11 0.10 0.50 0.50 1.4 ## anxious 0.17 0.08 0.70 0.16 0.22 0.59 0.41 1.5 ## agreebl -0.02 0.02 -0.06 0.64 -0.17 0.44 0.56 1.2 ## tense 0.16 0.01 0.77 0.01 0.26 0.70 0.30 1.3 ## kind -0.10 -0.31 0.03 0.61 -0.18 0.51 0.49 1.8 ## opposng -0.01 0.11 0.09 -0.13 0.63 0.43 0.57 1.2 ## relaxed -0.02 0.13 -0.69 0.34 -0.07 0.62 0.38 1.6 ## disorgn -0.01 0.72 0.00 0.04 0.15 0.55 0.45 1.1 ## outgoin -0.83 -0.08 -0.05 0.22 0.00 0.74 0.26 1.2 ## approvn -0.27 -0.13 -0.12 0.51 -0.12 0.37 0.63 1.9 ## shy 0.72 0.20 0.16 0.00 -0.09 0.58 0.42 1.3 ## discipl 0.03 -0.63 0.04 0.09 0.09 0.42 0.58 1.1 ## harsh 0.07 0.13 0.07 -0.22 0.64 0.48 0.52 1.4 ## persevr -0.16 -0.54 0.11 0.19 0.09 0.38 0.62 1.6 ## friendl -0.50 -0.17 0.06 0.55 -0.16 0.60 0.40 2.4 ## worryin 0.18 0.05 0.73 0.05 0.13 0.59 0.41 1.2 ## respnsi -0.01 -0.73 0.06 0.23 0.02 0.59 0.41 1.2 ## contrar 0.05 0.14 0.15 -0.15 0.72 0.58 0.42 1.3 ## sociabl -0.72 0.01 -0.09 0.26 -0.10 0.60 0.40 1.3 ## lazy 0.19 0.68 0.07 0.03 0.13 0.51 0.49 1.3 ## coopera -0.10 -0.18 -0.11 0.57 -0.28 0.46 0.54 1.9 ## quiet 0.80 0.10 0.17 0.16 0.00 0.70 0.30 1.2 ## organiz -0.06 -0.76 -0.02 0.06 -0.06 0.58 0.42 1.0 ## criticl 0.09 -0.17 0.14 -0.11 0.58 0.40 0.60 1.4 ## lax 0.04 0.47 -0.22 0.23 0.10 0.33 0.67 2.1 ## laidbck -0.03 0.25 -0.59 0.28 0.09 0.50 0.50 1.9 ## withdrw 0.73 0.14 0.13 -0.09 0.27 0.66 0.34 1.5 ## givinup 0.36 0.46 0.22 -0.10 0.13 0.42 0.58 2.7 ## easygon -0.13 0.13 -0.44 0.43 -0.02 0.42 0.58 2.4 ## ## MR1 MR2 MR4 MR5 MR3 ## SS loadings 4.51 4.43 2.97 2.52 2.36 ## Proportion Var 0.14 0.14 0.09 0.08 0.07 ## Cumulative Var 0.14 0.28 0.37 0.45 0.52 ## Proportion Explained 0.27 0.26 0.18 0.15 0.14 ## Cumulative Proportion 0.27 0.53 0.71 0.86 1.00 ## ## Mean item complexity = 1.5 ## Test of the hypothesis that 5 factors are sufficient. ## ## The degrees of freedom for the null model are 496 and the objective function was 17.62 with Chi Square of 4009.54 ## The degrees of freedom for the model are 346 and the objective function was 3.17 ## ## The root mean square of the residuals (RMSR) is 0.04 ## The df corrected root mean square of the residuals is 0.05 ## ## The harmonic number of observations is 240 with the empirical chi square 355.07 with prob < 0.36 ## The total number of observations was 240 with Likelihood Chi Square = 710.66 with prob < 2.2e-27 ## ## Tucker Lewis Index of factoring reliability = 0.849 ## RMSEA index = 0.071 and the 90 % confidence intervals are 0.059 0.073 ## BIC = -1185.64 ## Fit based upon off diagonal values = 0.98 ## Measures of factor score adequacy ## MR1 MR2 MR4 MR5 MR3 ## Correlation of (regression) scores with factors 0.96 0.94 0.93 0.90 0.89 ## Multiple R square of scores with factors 0.91 0.89 0.86 0.81 0.79 ## Minimum correlation of possible factor scores 0.83 0.78 0.72 0.62 0.58
sink()
fa.diagram(fa_5, marg=c(.01,.01,1,.01))
How would you label those 5 factors?
require(semPlot) semplot1<-semPlotModel(fa_5$loadings) semPaths(semplot1, what="std", layout="circle", nCharNodes = 6)
Make a plot for your item scores.
RMSEA and Tucker-Lewis Index (TLI).
A widely used cut-off for RMSEA is .06 (Hu & Bentler, 1999), others suggest .08 as acceptable. But beware of cut-offs. RMSEA is sample size dependent more so than TLI
For the Tucker-Lewis Index >.9 or >.95 is considered a good fit. Again beware of cut-offs.
Other indices also exist and we will discuss those when we move to SEM.
The five factor model does better than the three factor model. But beware exploratory rather than confirmatory.
Sample description: While the five factor model could be considered a close fit in RMSEA (.071), it was not in terms of TLI (.849).
There are many methods. Most of the time you'll get similar results
fm= minres factor analysis, principal axis factor analysis, weighted least squares factor analysis, generalized least squares factor analysis and maximum likelihood factor analysis. Minres and Principal Axis factoring are commonly used.
require(psych) parallel <- fa.parallel(f_data, fm = 'pa', fa = 'fa')
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
You can further beautify this by generating labels for those 32 items.
require(stargazer) require(plyr) factor_loadings<-as.data.frame(as.matrix.data.frame(fa_5$loadings)) factor_loadings<-plyr::rename(factor_loadings, c("V1"="Factor 1","V2"="Factor 2", "V3"="Factor 3", "V4"="Factor 4", "V5"="Factor 5")) stargazer(factor_loadings, summary = FALSE,out= "results_loadings.html", header=FALSE, type="html")
Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | |
1 | 0.587 | 0.083 | 0.075 | -0.110 | 0.301 |
2 | -0.763 | 0.029 | -0.023 | 0.124 | 0.163 |
3 | 0.022 | 0.603 | 0.086 | -0.027 | 0.275 |
4 | -0.196 | -0.645 | 0.139 | 0.113 | 0.101 |
5 | 0.169 | 0.080 | 0.698 | 0.156 | 0.217 |
6 | -0.025 | 0.021 | -0.057 | 0.640 | -0.169 |
7 | 0.161 | 0.008 | 0.775 | 0.007 | 0.264 |
8 | -0.103 | -0.313 | 0.032 | 0.609 | -0.184 |
9 | -0.011 | 0.107 | 0.091 | -0.133 | 0.630 |
10 | -0.020 | 0.134 | -0.692 | 0.340 | -0.070 |
11 | -0.014 | 0.723 | -0.001 | 0.041 | 0.153 |
12 | -0.825 | -0.084 | -0.055 | 0.218 | 0.001 |
13 | -0.268 | -0.128 | -0.119 | 0.505 | -0.119 |
14 | 0.715 | 0.195 | 0.161 | -0.003 | -0.091 |
15 | 0.033 | -0.630 | 0.045 | 0.090 | 0.090 |
16 | 0.067 | 0.127 | 0.065 | -0.221 | 0.640 |
17 | -0.159 | -0.542 | 0.112 | 0.193 | 0.091 |
18 | -0.500 | -0.166 | 0.061 | 0.546 | -0.158 |
19 | 0.177 | 0.051 | 0.731 | 0.046 | 0.131 |
20 | -0.007 | -0.729 | 0.059 | 0.234 | 0.018 |
21 | 0.052 | 0.144 | 0.146 | -0.148 | 0.719 |
22 | -0.715 | 0.009 | -0.087 | 0.259 | -0.098 |
23 | 0.187 | 0.676 | 0.072 | 0.026 | 0.126 |
24 | -0.102 | -0.180 | -0.105 | 0.569 | -0.280 |
25 | 0.799 | 0.098 | 0.171 | 0.159 | 0.004 |
26 | -0.060 | -0.757 | -0.016 | 0.058 | -0.057 |
27 | 0.087 | -0.169 | 0.144 | -0.111 | 0.575 |
28 | 0.036 | 0.470 | -0.218 | 0.230 | 0.101 |
29 | -0.027 | 0.250 | -0.590 | 0.283 | 0.088 |
30 | 0.734 | 0.144 | 0.128 | -0.092 | 0.272 |
31 | 0.362 | 0.459 | 0.217 | -0.103 | 0.132 |
32 | -0.130 | 0.132 | -0.444 | 0.430 | -0.022 |
require(GPArotation) plot(fa_5,labels=names(f_data),cex=.7, ylim=c(-.1,1))
Just plot the first two factors, and those with loadings above .5. Label it.
factor.plot(fa_5, choose=c(1,2), cut=0.5, labels=colnames(f_data))
Alternative graphs, see here
Multiple factor analysis (group-component)
Find out how to do it here
Perhaps you should not rely on it too much?
Load the BFI data from the 'psych' package (??bfi). This contains data on 2800 participants completing items relating to the 'big five' from the IPIP pool. You'll have to subset the variables for your factor analysis.
Conduct a Bartlett's test & KMO test.
Conduct an exploratory factor analysis (using 'minres' as method), using parallel analysis, discuss the scree plot, Very Simple Structure and Velicer map test.
Extract a five factor model (use varimax rotation), export the factor loadings of these five factors. Discuss the RMSEA and TLI for that five factor model.
Make a plot for the factors.
Also check the reading list! (many more than listed here)