Exercise 9 instructions.

Use the following code below to generate a correlation matrix to use (you’ll need to load lavaan).

The data comes from this paper. A study testing the Theory of Planned Behaviour when applying to graduate school for 131 students.

require(lavaan)
## Loading required package: lavaan
## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.
data<-lav_matrix_lower2full(c(1,.47,1,.66,.50,1,.77,.41,.46,1,.52,.38,.50,.50,1))
rownames(data)<-colnames(data)<-c("att",'sn','pbc','int','beh') # a matrix.

Reconstruct the model from the article. (Note that there are no (!) latent variables)

Discuss the fit. Make a sem plot.

What happens when you remove the indirect path?

Basic model from the article.

This exercise is taken from Huang 2016.

Have a look at the article this is the basic model they propose. The equations are corresponding to Figure 1, they are all manifest variables, even though denoted with ellipses rather than rectangles in the original paper.

model<-'
int~att+sn+pbc
beh~int+pbc
'

Note: We have a correlation matrix, but the command used refers to covariance matrix.

path_model1<-sem(model,sample.cov=data,sample.nobs=131)

We can also directly request an \(R^2\) measure for this model.

summary(path_model1,standardized=T,fit=T,rsquare=T)
## lavaan 0.6-9 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         7
##                                                       
##   Number of observations                           131
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 1.834
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.400
## 
## Model Test Baseline Model:
## 
##   Test statistic                               177.710
##   Degrees of freedom                                 7
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.003
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -282.820
##   Loglikelihood unrestricted model (H1)       -281.903
##                                                       
##   Akaike (AIC)                                 579.640
##   Bayesian (BIC)                               599.766
##   Sample-size adjusted Bayesian (BIC)          577.626
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.169
##   P-value RMSEA <= 0.05                          0.510
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.019
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   int ~                                                                 
##     att               0.804    0.075   10.713    0.000    0.804    0.804
##     sn                0.090    0.065    1.381    0.167    0.090    0.090
##     pbc              -0.116    0.076   -1.512    0.131   -0.116   -0.116
##   beh ~                                                                 
##     int               0.342    0.080    4.292    0.000    0.342    0.342
##     pbc               0.342    0.080    4.292    0.000    0.342    0.342
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .int               0.394    0.049    8.093    0.000    0.394    0.397
##    .beh               0.653    0.081    8.093    0.000    0.653    0.658
## 
## R-Square:
##                    Estimate
##     int               0.603
##     beh               0.342

Let’s make a plot… . Our plot looks different but the values pretty much map onto to those of the original article.

require(semPlot)
## Loading required package: semPlot
require(qgraph)
## Loading required package: qgraph
semPaths(path_model1,style = "lisrel",what="std",  curvePivot = TRUE)

We could calculate the direct, indirect and total paths from this graph. But we could not formally test the mediation.

direct_model<-'beh~int+pbc+att+sn'

This model actually explains more variance and is a better fit based on AIC and on BIC. Depending on your view, a model with just main effects would thus be parsimonious to one with the indirect pathway.

direct_model1<-sem(direct_model,sample.cov=data,sample.nobs=131)
summary(direct_model1,standardized=T,fit=T,rsquare=T)
## lavaan 0.6-9 ended normally after 11 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         5
##                                                       
##   Number of observations                           131
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                                56.757
##   Degrees of freedom                                 4
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -157.001
##   Loglikelihood unrestricted model (H1)       -157.001
##                                                       
##   Akaike (AIC)                                 324.001
##   Bayesian (BIC)                               338.377
##   Sample-size adjusted Bayesian (BIC)          322.563
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   beh ~                                                                 
##     int               0.266    0.112    2.380    0.017    0.266    0.266
##     pbc               0.270    0.099    2.735    0.006    0.270    0.270
##     att               0.094    0.131    0.718    0.473    0.094    0.094
##     sn                0.092    0.084    1.098    0.272    0.092    0.092
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .beh               0.643    0.080    8.093    0.000    0.643    0.648
## 
## R-Square:
##                    Estimate
##     beh               0.352
anova(path_model1,direct_model1)
## Chi-Squared Difference Test
## 
##               Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## direct_model1  0 324.00 338.38 0.0000                              
## path_model1    2 579.64 599.77 1.8338     1.8337       2     0.3998

Mediation tests.

model2<-'
int~att+sn+a*pbc 
beh~b*int+c*pbc

indirect := a*b # we care about this one most.
direct := c
total := c + a*b'

Unfortunately we cannot do the bootstrapping without the raw data. The indirect effect is not significant but we are basing this on the correlations… . As you might recall the product-moment approach is not very powerful and our sample is comparatively small.

path_model2<-sem(model2, sample.cov=data, sample.nobs=131)
summary(path_model2,standardized=T,fit=T,rsquare=T)
## lavaan 0.6-9 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         7
##                                                       
##   Number of observations                           131
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 1.834
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.400
## 
## Model Test Baseline Model:
## 
##   Test statistic                               177.710
##   Degrees of freedom                                 7
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.003
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -282.820
##   Loglikelihood unrestricted model (H1)       -281.903
##                                                       
##   Akaike (AIC)                                 579.640
##   Bayesian (BIC)                               599.766
##   Sample-size adjusted Bayesian (BIC)          577.626
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.169
##   P-value RMSEA <= 0.05                          0.510
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.019
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   int ~                                                                 
##     att               0.804    0.075   10.713    0.000    0.804    0.804
##     sn                0.090    0.065    1.381    0.167    0.090    0.090
##     pbc        (a)   -0.116    0.076   -1.512    0.131   -0.116   -0.116
##   beh ~                                                                 
##     int        (b)    0.342    0.080    4.292    0.000    0.342    0.342
##     pbc        (c)    0.342    0.080    4.292    0.000    0.342    0.342
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .int               0.394    0.049    8.093    0.000    0.394    0.397
##    .beh               0.653    0.081    8.093    0.000    0.653    0.658
## 
## R-Square:
##                    Estimate
##     int               0.603
##     beh               0.342
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect         -0.040    0.028   -1.426    0.154   -0.040   -0.040
##     direct            0.342    0.080    4.292    0.000    0.342    0.342
##     total             0.303    0.088    3.426    0.001    0.303    0.303

Simulate the original data.

We cannot use the more powerful bootstrapping approach based on the correlation matrix.

We can use the ‘gendata’ package to simulate a raw dataset which would give the correlations below.

library(gendata)
raw<-genmvnorm(data,k=5,n=131,seed=666)
cor(raw)
##           X1        X2        X3        X4        X5
## X1 1.0000000 0.4811203 0.6944940 0.7663960 0.5557357
## X2 0.4811203 1.0000000 0.5816366 0.3858113 0.3136334
## X3 0.6944940 0.5816366 1.0000000 0.4742997 0.5291354
## X4 0.7663960 0.3858113 0.4742997 1.0000000 0.4525513
## X5 0.5557357 0.3136334 0.5291354 0.4525513 1.0000000

Add the labels.

colnames(raw)<-c("att",'sn','pbc','int','beh')

Bootstrap. This will take a lot of time.

Contrary, to Huang, we find no evidence for a significant mediation effect.

results_raw<-sem(model2,data=raw,se="bootstrap",bootstrap=1000) 
summary(results_raw,standardized=T,fit=T,rsquare=T)
## lavaan 0.6-9 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         7
##                                                       
##   Number of observations                           131
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 4.315
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.116
## 
## Model Test Baseline Model:
## 
##   Test statistic                               176.286
##   Degrees of freedom                                 7
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.986
##   Tucker-Lewis Index (TLI)                       0.952
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -281.436
##   Loglikelihood unrestricted model (H1)       -279.279
##                                                       
##   Akaike (AIC)                                 576.873
##   Bayesian (BIC)                               596.999
##   Sample-size adjusted Bayesian (BIC)          574.859
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.094
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.218
##   P-value RMSEA <= 0.05                          0.198
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.020
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   int ~                                                                 
##     att               0.814    0.074   10.964    0.000    0.814    0.834
##     sn                0.066    0.059    1.129    0.259    0.066    0.069
##     pbc        (a)   -0.141    0.071   -1.990    0.047   -0.141   -0.145
##   beh ~                                                                 
##     int        (b)    0.248    0.077    3.219    0.001    0.248    0.260
##     pbc        (c)    0.376    0.068    5.517    0.000    0.376    0.406
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .int               0.408    0.057    7.103    0.000    0.408    0.403
##    .beh               0.617    0.066    9.284    0.000    0.617    0.668
## 
## R-Square:
##                    Estimate
##     int               0.597
##     beh               0.332
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect         -0.035    0.021   -1.642    0.101   -0.035   -0.038
##     direct            0.376    0.068    5.515    0.000    0.376    0.406
##     total             0.341    0.071    4.821    0.000    0.341    0.368

References.

Huang, F. (2016). Path analysis with R. http://web.missouri.edu/~huangf/data/mvnotes/Using_R_for_path_analysis.html

THE END!