Introduction.

This is a worksheet for use with Lecture 7.

You have a video of me narrating these slides. Note that there are potentially minor discrepancies between the current set of slides and the one in the video. The slide numbers refer to the current set. I do not cover every single slide but you can code along!

If you answer correctly the colour of the box will change! (Don't worry about bonus questions, they are very much just that: a bonus!)

Slides

Factor analysis (Slide 5)

I would use factor analysis to explore whether similar items can be grouped together (True of False)

Data (Slide 12)

You can use stargazer to print a summary of the dataframe as on the slides. Or you can calculate these yourself.

please answer the following (3 decimals for all responses).

the means of tense =

the standard deviation of anxious =

the means of lax =

the standard deviation of quiet =

Measurement and sample size (Slide 13)

What is the difference between data measured on an interval scale and data measured on a ratio scale?

  1. A ratio scale has a true zero point, so zero on the scale corresponds to zero of the concept being measured.
  2. A ratio scale has equal intervals between the points on the scale, whereas an interval scale does not.
  3. An interval scale has a true zero point, so zero on the scale corresponds to zero of the concept being measured.
  4. A ratio scale puts scores into categories, while an interval scale measures on a continuous scale.

My answer:

Have a look at this website

Multivariate normality. (Slide 14)

We covered multivariate normality when we discussed OLS regression assumptions (True of False)

Have a look back at your slides, we covered this assumption when discussing MANOVA rather than OLS regression.

Bonus

You can also calculate the multivariate normality test. What do you conclude?

You should have concluded that the assumption of multivariate normality is not upheld.

Multivariate normality. (Slide 15)

The code isn't shown for the plot. You can get the plot with this:

mvn(f_data, multivariatePlot = "qq")

KMO test (Slide 19)

Some of the output was not printed. Can you complete the following? (2 decimals)

relaxed =

"respnsi" =

"withdrw" =

Try it yourself (Slide 22)

Work out what you need to do from the previous slides. Remember that you can look up what dplyr's select does via the help function.

Some further terms. (Slide 26)

A factor loading of 0.80 means, generally speaking, that:

  1. the item is poorly related to the factor.
  2. the variable is moderately related with the factor.
  3. there is no relationship between that variable and the factor.
  4. the item correlates well with the factor, though not perfectly.

My answer:

A factor loading is:

  1. a correlation coefficient between a variable and a factor (cluster of variables).
  2. empirically based hypothetical variable consisting of items which are strongly associated with each other.
  3. the correlation between a binomial variable and a variable which has a continuous distribution of scores.
  4. the correlation of a variable with a whole score.

My answer:

Which of the following is correct?

  1. In extracting factors we want to account for as little variance as possible while keeping the number of factors extracted as small as possible.
  2. In extracting factors we want to account for as little variance as possible while keeping the number of factors extracted as large as possible.
  3. In extracting factors we want to account for as much variance as possible while keeping the number of factors extracted as small as possible.
  4. In extracting factors we want to account for as much variance as possible while keeping the number of factors extracted as large as possible.

My answer:

Have a look at this website

Number of factors + Continued (Slide 29-30)

Kaiser criterion for retaining factors is:

  1. Retain any factor with an eigenvalue greater than 1.
  2. Retain any factor with an eigenvalue greater than 0.3.
  3. Retain factors before the point of inflexion on a scree plot.
  4. Retain factors with communalities greater than 0.7.

My answer:

Parallel analysis. (Slide 32)

What is 'minres' and abbreviation of (2 words, UK spelling)?

Have a look at this website

VSS / MAP (Slide 36)

Have a look at the output VSS 1 extracts 5 factors but 3 is more reasonable as you can see in the plot.

From the output, what is the maximum achieved by VSS 2 (2 decimals)?

What does VSS test stand for? (3 words)

Bonus

What does MAP test stand for? (3 words)

Have a look at this website

Try it yourself (Slide 37)

In case this was unclear, this is based on the 'M255.sav' dataset.

How many factors have you decided on? Do you agree with your neighbour, why not?

Have a look at here

Remember to close the sink! (Slide 38)

As you can see everything is printed to screen here - but it does not show the sink() command. In week 6, we saw an example of closing the sink(). If you don't close the sink, things will continue to be printed to that file. This slide only prints the first part of the output.

You cannot see: fa_5<-fa(f_data,3, fm = 'minres', rotate='varimax', fa = 'fa' sink('fa_5_output.txt') fa_5 sink()

Try it yourself (Slide 41)

How did you draw the plot?

How would you label the factors?

Bonus.

Also conduct VSS and Velicer MAP's test.

Extraction methods: choice paralysis. (Slide 44)

Bonus

Varimax rotation could be used when:

  1. You believe that the underlying factors will be correlated.
  2. You believe that the underlying factors are non-orthogonal.
  3. You believe that the underlying factors are independent.
  4. Kaiser’s criterion is met.

My answer:

Have a look at here

Also note that this terminology, is not really clear - if you want to go down a wormhole - this explains the differences.

Extract loadings (Slide 46)

Please complete the following:

Item 3 primarily loads on Factor

Item 5 primarily loads on Factor

Item 6 primarily loads on Factor

Too busy and not very useful. (Slide 48)

Which of the following is true:

  1. The Y-axis differentiates items relating to extraversion.
  2. The Y-axis differentiates items relating to conscientiousness.
  3. The X-axis differentiates items relating to conscientiousness.
  4. The X-axis differentiates items relating to agreeableness.

My answer:

Exercise (Slide 52-53)

Complete the exercise and submit via Blackboard!

Going further.

Session Info.

Thanks to Lisa DeBruine for the webexercises package. Some of the multiple choice questions are from Psychology Express: Research methods, Discovering Statistics by Andy Field and from Statistics Without Maths for Psychology, 7th Edition. Please see general disclaimer.

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] webexercises_1.1.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.33     R6_2.5.1          fastmap_1.1.1     xfun_0.41        
##  [5] cachem_1.0.8      knitr_1.45        htmltools_0.5.7   rmarkdown_2.25   
##  [9] cli_3.6.1         sass_0.4.7        jquerylib_0.1.4   compiler_4.3.2   
## [13] rstudioapi_0.15.0 tools_4.3.2       evaluate_0.23     bslib_0.5.1      
## [17] yaml_2.3.7        jsonlite_1.8.7    rlang_1.1.3

The end...