Exercise 4 instructions.

Back to the salaries data.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture4")
require(carData)
## Loading required package: carData
salaries<-carData::Salaries
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
salaries<- salaries %>% mutate(monthly_salary=salary/9)
require(skimr)
## Loading required package: skimr
skim(salaries)
Data summary
Name salaries
Number of rows 397
Number of columns 7
_______________________
Column type frequency:
factor 3
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
rank 0 1 FALSE 3 Pro: 266, Ass: 67, Ass: 64
discipline 0 1 FALSE 2 B: 216, A: 181
sex 0 1 FALSE 2 Mal: 358, Fem: 39

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
yrs.since.phd 0 1 22.31 12.89 1.00 12.00 21.00 32.00 56.00 ▇▇▆▅▁
yrs.service 0 1 17.61 13.01 0.00 7.00 16.00 27.00 60.00 ▇▅▃▂▁
salary 0 1 113706.46 30289.04 57800.00 91000.00 107300.00 134185.00 231545.00 ▅▇▅▂▁
monthly_salary 0 1 12634.05 3365.45 6422.22 10111.11 11922.22 14909.44 25727.22 ▅▇▅▂▁

Correlation coefficients.

cor(salaries$yrs.since.phd, salaries$monthly_salary)
## [1] 0.4192311
cor(salaries$yrs.since.phd, salaries$monthly_salary, method='spearman')
## [1] 0.4755676

Significance tests for correlations.

Here I have also calculated the 99% confidence interval.

cor.test(salaries$yrs.since.phd, salaries$monthly_salary, conf.level = .99)
## 
##  Pearson's product-moment correlation
## 
## data:  salaries$yrs.since.phd and salaries$monthly_salary
## t = 9.1775, df = 395, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 99 percent confidence interval:
##  0.3067831 0.5201368
## sample estimates:
##       cor 
## 0.4192311
cor.test(salaries$yrs.since.phd, salaries$monthly_salary, method = 'spearman', conf.level = .99)
## Warning in cor.test.default(salaries$yrs.since.phd, salaries$monthly_salary, :
## Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  salaries$yrs.since.phd and salaries$monthly_salary
## S = 5468989, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4755676

Sample write up: Years since Ph.D. and monthly salary were positively and moderately related (\(r_s\)= .476, p<.0001).

hierarchical OLS regression.

Build the models.

# rename sex to gender
colnames(salaries)[5] <-"gender"
# Linear models.
lm_1<-lm(monthly_salary ~ yrs.since.phd, data= salaries)
lm_2<-lm(monthly_salary ~ yrs.since.phd + gender, data= salaries)
lm_3<-lm(monthly_salary ~ yrs.since.phd + gender + rank, data= salaries)
summary(lm_1)
## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd, data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9352.3 -2159.1  -317.6  1787.4 11375.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   10190.97     307.31  33.162   <2e-16 ***
## yrs.since.phd   109.48      11.93   9.177   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3059 on 395 degrees of freedom
## Multiple R-squared:  0.1758, Adjusted R-squared:  0.1737 
## F-statistic: 84.23 on 1 and 395 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender, data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9351.9 -2192.8  -283.5  1714.2 11336.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9464.65     527.59  17.939   <2e-16 ***
## yrs.since.phd   106.45      12.04   8.845   <2e-16 ***
## genderMale      880.40     520.45   1.692   0.0915 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3052 on 394 degrees of freedom
## Multiple R-squared:  0.1817, Adjusted R-squared:  0.1775 
## F-statistic: 43.74 on 2 and 394 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender + rank, 
##     data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8549.39     495.08  17.269  < 2e-16 ***
## yrs.since.phd   -10.23      14.41  -0.710  0.47801    
## genderMale      571.85     448.76   1.274  0.20332    
## rankAssocProf  1556.98     482.53   3.227  0.00136 ** 
## rankProf       5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16

Note that in the final model years since Ph.D. is no longer statistically significant. It is important to note that we have not yet looked into the assumptions or made graphs.

Export the models.

I chose American Sociological review.

require(stargazer)
## Loading required package: stargazer
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(lm_1,lm_2,lm_3,  dep.var.labels=c("Monthly Salary (USD)"),covariate.labels=c("Years since Ph.D.","Gender: Male", "Rank: Associate Prof.", "Rank: Prof."), style="asr", out="hierarchical_exercise4_stargazer.htm", header=F)
## 
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lccc} 
## \\[-1.8ex]\hline \\[-1.8ex] 
## \\[-1.8ex] & \multicolumn{3}{c}{Monthly Salary (USD)} \\ 
## \\[-1.8ex] & (1) & (2) & (3)\\ 
## \hline \\[-1.8ex] 
##  Years since Ph.D. & 109.482$^{***}$ & 106.453$^{***}$ & $-$10.234 \\ 
##   Gender: Male &  & 880.403 & 571.845 \\ 
##   Rank: Associate Prof. &  &  & 1,556.978$^{**}$ \\ 
##   Rank: Prof. &  &  & 5,292.878$^{***}$ \\ 
##   Constant & 10,190.970$^{***}$ & 9,464.646$^{***}$ & 8,549.387$^{***}$ \\ 
##  \textit{N} & 397 & 397 & 397 \\ 
## R$^{2}$ & 0.176 & 0.182 & 0.397 \\ 
## Adjusted R$^{2}$ & 0.174 & 0.178 & 0.391 \\ 
## Residual Std. Error & 3,059.287 (df = 395) & 3,052.104 (df = 394) & 2,625.931 (df = 392) \\ 
## F Statistic & 84.226$^{***}$ (df = 1; 395) & 43.742$^{***}$ (df = 2; 394) & 64.613$^{***}$ (df = 4; 392) \\ 
## \hline \\[-1.8ex] 
## \multicolumn{4}{l}{$^{*}$p $<$ .05; $^{**}$p $<$ .01; $^{***}$p $<$ .001} \\ 
## \end{tabular} 
## \end{table}

You can also rely on APA Tables. But I found it buggy when combining multiple models (issues with rbind()). (There is also apaStyle but that does not play nice with RJava on Mac). Here is an example.

require(apaTables)
## Loading required package: apaTables
apa.reg.table(lm_1,filename = 'APA_exercise4.doc',table.number = 1)
## 
## 
## Table 1 
## 
## Regression results using monthly_salary as the criterion
##  
## 
##      Predictor          b            b_95%_CI beta  beta_95%_CI sr2 sr2_95%_CI
##    (Intercept) 10190.97** [9586.80, 10795.13]                                 
##  yrs.since.phd   109.48**     [86.03, 132.94] 0.42 [0.33, 0.51] .18 [.11, .24]
##                                                                               
##                                                                               
##                                                                               
##      r             Fit
##                       
##  .42**                
##            R2 = .176**
##        95% CI[.11,.24]
##                       
## 
## Note. A significant b-weight indicates the beta-weight and semi-partial correlation are also significant.
## b represents unstandardized regression weights. beta indicates the standardized regression weights. 
## sr2 represents the semi-partial correlation squared. r represents the zero-order correlation.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
## 

Assumptions.

Linearity.

Note that here we only checked the assumptions for the final model.

plot(lm_3, which = 1)

Plot 1: Not curvilinear but also not very nicely distributed. Not wholly linear as we would want it. The data shows clumping, likely as there are pay grades. There are no data between 11,000-13,000 USD and we linearly interpolate. That could be wrong - as in reality people move up in steps, something our model does not account for. While one would loose information, a solution could be to explore ordinal methods (incl. tobit regression). These more advanced methods can handle censoring issues. There is a trade-off between keeping things ‘simple’(- in reality most relationships are not wholly linear) and easily interpretable versus correctly describing the data (but also running the risk of overfitting). Within its remits we can still use our model, however - bearing in mind that it is a linear representation of the relationships between variables.

Normality of residuals.

plot(lm_3, which=2)

Plot 2: Generally OK but at the top end our model ‘derails’. Also one extreme case (44), we could consider running the analysis with or without that case. Another potential solution is to log-transform our data. What happens then?

lm_3_log<-lm(log10(monthly_salary) ~ yrs.since.phd + gender + rank, data= salaries)
plot(lm_3_log, which=2)

This looks much nicer, does it not! But we should bear in mind that a shift in X now represents a shift in log10(Y) which might make it harder to directly interpret the coefficients (but less of a worry with distribution of the residuals now.)

Homoscedasticity.

plot(lm_3, which=3)

Plot 3: Ideally, we would want a straight line here. That is not the case. Again notice the issues with the gap in X. The ‘funnel’ is widening when X is increasing. So, this could point to issues with heteroscedasticity. The plot is slightly better for our log-transformed data. A solution is to work with Heteroscedasticity consistent errors.

plot(lm_3_log, which=3)

A Breusch-Pagan test shows that in both cases, there would be issues with the assumption of homoscedasticity. (Also it suggests that the problem is actually the same, if not stronger, in the ‘log’ model).

require(lmtest)
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
bptest(lm_3)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm_3
## BP = 59.457, df = 4, p-value = 3.772e-12
bptest(lm_3_log)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm_3_log
## BP = 62.284, df = 4, p-value = 9.599e-13

Cook’s distance.

Cook’s distance gives an indication of influential cases in the regression model. Some put the cut-off at 4/(N - k -1) – (check here). But best to just check the plots.

plot(lm_3, which=4)

Plot 4: The plot is largely fine, there are some influential cases (351, 283), perhaps one would ignore 331 as it is close to other peaks. One could run the analyses with and without those cases. Alternatively, one could bootstrap the relevant statistics, in case of worry about robustness. (Or rely on ‘robust’ techniques - an example below).

Incidentally in our transformed model, 283 (and perhaps 351) are again potential ‘monkey wrenches’ you could consider again to bootstrap your models, or present models with and without those ‘outliers’.

plot(lm_3_log, which=4)

Autocorrelation.

In this case both models exhibit autocorrelation.

require(lmtest)
dwtest(lm_3)
## 
##  Durbin-Watson test
## 
## data:  lm_3
## DW = 1.7452, p-value = 0.005363
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(lm_3_log)
## 
##  Durbin-Watson test
## 
## data:  lm_3_log
## DW = 1.6538, p-value = 0.0002655
## alternative hypothesis: true autocorrelation is greater than 0

Heteroscedasticty and autocorrelation corrected errors.

Not that I expect you to run this but here is an example of a model which correctly deals with this issue. Notice the difference it makes for the Gender coefficient’s SE and significance level. Whereas before we might have ruled out the existence of an effect, this model suggests a trend (still not strong evidence though).

require(sandwich)
## Loading required package: sandwich
summary(lm_3)
## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender + rank, 
##     data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8549.39     495.08  17.269  < 2e-16 ***
## yrs.since.phd   -10.23      14.41  -0.710  0.47801    
## genderMale      571.85     448.76   1.274  0.20332    
## rankAssocProf  1556.98     482.53   3.227  0.00136 ** 
## rankProf       5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16
vcv <- vcovHAC(lm_3) # variance covariance matrix (which we might want to store)
coeftest(lm_3, vcov = vcovHAC(lm_3)) 
## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)   8549.387    302.314 28.2798 < 2.2e-16 ***
## yrs.since.phd  -10.234     17.789 -0.5753   0.56543    
## genderMale     571.845    321.820  1.7769   0.07636 .  
## rankAssocProf 1556.978    270.749  5.7506 1.791e-08 ***
## rankProf      5292.878    404.665 13.0796 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\beta\) for model 1

Here we calculate the \(\beta\)’s for model 1. I used dplyr to z-score (and the mosaic package to get an alternative method for z-scoring as scale() can run into problems).

require(dplyr)
require(mosaic)
## Loading required package: mosaic
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following object is masked from 'package:ggplot2':
## 
##     stat
## The following object is masked from 'package:skimr':
## 
##     n_missing
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
salaries <- salaries %>% mutate(monthly_salary_stand=zscore(monthly_salary), yrs.since.phd_stand=zscore(yrs.since.phd))
lm_stand1<- lm(monthly_salary_stand ~ yrs.since.phd_stand-1, data= salaries)
summary(lm_stand1)
## 
## Call:
## lm(formula = monthly_salary_stand ~ yrs.since.phd_stand - 1, 
##     data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7789 -0.6415 -0.0944  0.5311  3.3802 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## yrs.since.phd_stand  0.41923    0.04562   9.189   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9079 on 396 degrees of freedom
## Multiple R-squared:  0.1758, Adjusted R-squared:  0.1737 
## F-statistic: 84.44 on 1 and 396 DF,  p-value: < 2.2e-16
# alternative
require(QuantPsyc)
## Loading required package: QuantPsyc
## Loading required package: boot
## 
## Attaching package: 'boot'
## The following object is masked from 'package:mosaic':
## 
##     logit
## The following object is masked from 'package:lattice':
## 
##     melanoma
## Loading required package: purrr
## 
## Attaching package: 'purrr'
## The following object is masked from 'package:mosaic':
## 
##     cross
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'QuantPsyc'
## The following object is masked from 'package:Matrix':
## 
##     norm
## The following object is masked from 'package:base':
## 
##     norm
lm.beta(lm_1)
## yrs.since.phd 
##     0.4192311

The \(\beta\) is 0.42 this implies that a shift in one standard deviation of years since Ph.D. results in a shift of .42 standard deviation. (Notice how this is the same value as your Pearson correlation coefficient!)

Bootstrap

# Bootstrap 95% CI for B coefficient.
library(boot)
# function to obtain p.values from F-test from the data
coeff_B <- function(data, indices) {
data_boot <- data[indices, ] # allows boot to select sample
B_boot <- lm(monthly_salary_stand ~ yrs.since.phd_stand-1, data = data_boot)
return(B_boot$coeff[1]) # position of years 
}
# bootstrapping with 10000 replications
set.seed(1983) # slightly worse year.
results_coeff_B <- boot(data = salaries, statistic = coeff_B, R = 10000)

Now let us calculate the confidence interval.

plot(results_coeff_B)

# get 95% confidence interval 
boot.ci(results_coeff_B, type='bca')
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results_coeff_B, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.3085,  0.5205 )  
## Calculations and Intervals on Original Scale

Sample write up: The 95% confidence interval for \(\beta\) using 10,000 samples and bias-corrected accelerated bootstrapping is [.31, .52]. (If you had done a bootstrap of the Pearson correlation coefficient you should get similar values).

Model prediction.

Best to center everything. No point in centering categorical variables, note that we are calculating a shift here based on the reference categories (women and asst profs.). A shift of 3 years is associated with around $31 less a month, all else being equal. Now that is not very impressive, and that’s in part explained by the fact that academic rank does so much of the ‘heavy lifting’ and we are now looking at a shift away from the means (!). For a shift at the start of the career, you could use the initial model. A shift from asst. to associate professor implies another $1557 a month (all else being equal). (For men add another $572 a month (!))

require(dplyr)
# create centered variables
salaries<-dplyr::mutate(salaries, monthly_salary_cent=monthly_salary-mean(monthly_salary),yrs.since.phd_cent=yrs.since.phd-mean(yrs.since.phd))
# lm.
lm_3_cent<-lm(monthly_salary_cent ~ yrs.since.phd_cent + gender + rank, data=salaries)
summary(lm_3_cent)
## 
## Call:
## lm(formula = monthly_salary_cent ~ yrs.since.phd_cent + gender + 
##     rank, data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -4313.03     564.20  -7.645 1.63e-13 ***
## yrs.since.phd_cent   -10.23      14.41  -0.710  0.47801    
## genderMale           571.85     448.76   1.274  0.20332    
## rankAssocProf       1556.98     482.53   3.227  0.00136 ** 
## rankProf            5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16
# a shift of 3 for all coefficients
3*lm_3_cent$coefficients
##        (Intercept) yrs.since.phd_cent         genderMale      rankAssocProf 
##       -12939.08800          -30.70138         1715.53631         4670.93252 
##           rankProf 
##        15878.63255

German credit data.

require(caret)
## Loading required package: caret
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## The following object is masked from 'package:mosaic':
## 
##     dotPlot
data(GermanCredit)
data_credit<-GermanCredit
require(skimr)
skim(data_credit)
Data summary
Name data_credit
Number of rows 1000
Number of columns 62
_______________________
Column type frequency:
factor 1
numeric 61
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Class 0 1 FALSE 2 Goo: 700, Bad: 300

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Duration 0 1 20.90 12.06 4 12.0 18.0 24.00 72 ▇▇▂▁▁
Amount 0 1 3271.26 2822.74 250 1365.5 2319.5 3972.25 18424 ▇▂▁▁▁
InstallmentRatePercentage 0 1 2.97 1.12 1 2.0 3.0 4.00 4 ▂▃▁▂▇
ResidenceDuration 0 1 2.85 1.10 1 2.0 3.0 4.00 4 ▂▆▁▃▇
Age 0 1 35.55 11.38 19 27.0 33.0 42.00 75 ▇▆▃▁▁
NumberExistingCredits 0 1 1.41 0.58 1 1.0 1.0 2.00 4 ▇▅▁▁▁
NumberPeopleMaintenance 0 1 1.16 0.36 1 1.0 1.0 1.00 2 ▇▁▁▁▂
Telephone 0 1 0.60 0.49 0 0.0 1.0 1.00 1 ▆▁▁▁▇
ForeignWorker 0 1 0.96 0.19 0 1.0 1.0 1.00 1 ▁▁▁▁▇
CheckingAccountStatus.lt.0 0 1 0.27 0.45 0 0.0 0.0 1.00 1 ▇▁▁▁▃
CheckingAccountStatus.0.to.200 0 1 0.27 0.44 0 0.0 0.0 1.00 1 ▇▁▁▁▃
CheckingAccountStatus.gt.200 0 1 0.06 0.24 0 0.0 0.0 0.00 1 ▇▁▁▁▁
CheckingAccountStatus.none 0 1 0.39 0.49 0 0.0 0.0 1.00 1 ▇▁▁▁▅
CreditHistory.NoCredit.AllPaid 0 1 0.04 0.20 0 0.0 0.0 0.00 1 ▇▁▁▁▁
CreditHistory.ThisBank.AllPaid 0 1 0.05 0.22 0 0.0 0.0 0.00 1 ▇▁▁▁▁
CreditHistory.PaidDuly 0 1 0.53 0.50 0 0.0 1.0 1.00 1 ▇▁▁▁▇
CreditHistory.Delay 0 1 0.09 0.28 0 0.0 0.0 0.00 1 ▇▁▁▁▁
CreditHistory.Critical 0 1 0.29 0.46 0 0.0 0.0 1.00 1 ▇▁▁▁▃
Purpose.NewCar 0 1 0.23 0.42 0 0.0 0.0 0.00 1 ▇▁▁▁▂
Purpose.UsedCar 0 1 0.10 0.30 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Furniture.Equipment 0 1 0.18 0.39 0 0.0 0.0 0.00 1 ▇▁▁▁▂
Purpose.Radio.Television 0 1 0.28 0.45 0 0.0 0.0 1.00 1 ▇▁▁▁▃
Purpose.DomesticAppliance 0 1 0.01 0.11 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Repairs 0 1 0.02 0.15 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Education 0 1 0.05 0.22 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Vacation 0 1 0.00 0.00 0 0.0 0.0 0.00 0 ▁▁▇▁▁
Purpose.Retraining 0 1 0.01 0.09 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Business 0 1 0.10 0.30 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Purpose.Other 0 1 0.01 0.11 0 0.0 0.0 0.00 1 ▇▁▁▁▁
SavingsAccountBonds.lt.100 0 1 0.60 0.49 0 0.0 1.0 1.00 1 ▅▁▁▁▇
SavingsAccountBonds.100.to.500 0 1 0.10 0.30 0 0.0 0.0 0.00 1 ▇▁▁▁▁
SavingsAccountBonds.500.to.1000 0 1 0.06 0.24 0 0.0 0.0 0.00 1 ▇▁▁▁▁
SavingsAccountBonds.gt.1000 0 1 0.05 0.21 0 0.0 0.0 0.00 1 ▇▁▁▁▁
SavingsAccountBonds.Unknown 0 1 0.18 0.39 0 0.0 0.0 0.00 1 ▇▁▁▁▂
EmploymentDuration.lt.1 0 1 0.17 0.38 0 0.0 0.0 0.00 1 ▇▁▁▁▂
EmploymentDuration.1.to.4 0 1 0.34 0.47 0 0.0 0.0 1.00 1 ▇▁▁▁▅
EmploymentDuration.4.to.7 0 1 0.17 0.38 0 0.0 0.0 0.00 1 ▇▁▁▁▂
EmploymentDuration.gt.7 0 1 0.25 0.43 0 0.0 0.0 1.00 1 ▇▁▁▁▃
EmploymentDuration.Unemployed 0 1 0.06 0.24 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Personal.Male.Divorced.Seperated 0 1 0.05 0.22 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Personal.Female.NotSingle 0 1 0.31 0.46 0 0.0 0.0 1.00 1 ▇▁▁▁▃
Personal.Male.Single 0 1 0.55 0.50 0 0.0 1.0 1.00 1 ▆▁▁▁▇
Personal.Male.Married.Widowed 0 1 0.09 0.29 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Personal.Female.Single 0 1 0.00 0.00 0 0.0 0.0 0.00 0 ▁▁▇▁▁
OtherDebtorsGuarantors.None 0 1 0.91 0.29 0 1.0 1.0 1.00 1 ▁▁▁▁▇
OtherDebtorsGuarantors.CoApplicant 0 1 0.04 0.20 0 0.0 0.0 0.00 1 ▇▁▁▁▁
OtherDebtorsGuarantors.Guarantor 0 1 0.05 0.22 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Property.RealEstate 0 1 0.28 0.45 0 0.0 0.0 1.00 1 ▇▁▁▁▃
Property.Insurance 0 1 0.23 0.42 0 0.0 0.0 0.00 1 ▇▁▁▁▂
Property.CarOther 0 1 0.33 0.47 0 0.0 0.0 1.00 1 ▇▁▁▁▃
Property.Unknown 0 1 0.15 0.36 0 0.0 0.0 0.00 1 ▇▁▁▁▂
OtherInstallmentPlans.Bank 0 1 0.14 0.35 0 0.0 0.0 0.00 1 ▇▁▁▁▁
OtherInstallmentPlans.Stores 0 1 0.05 0.21 0 0.0 0.0 0.00 1 ▇▁▁▁▁
OtherInstallmentPlans.None 0 1 0.81 0.39 0 1.0 1.0 1.00 1 ▂▁▁▁▇
Housing.Rent 0 1 0.18 0.38 0 0.0 0.0 0.00 1 ▇▁▁▁▂
Housing.Own 0 1 0.71 0.45 0 0.0 1.0 1.00 1 ▃▁▁▁▇
Housing.ForFree 0 1 0.11 0.31 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Job.UnemployedUnskilled 0 1 0.02 0.15 0 0.0 0.0 0.00 1 ▇▁▁▁▁
Job.UnskilledResident 0 1 0.20 0.40 0 0.0 0.0 0.00 1 ▇▁▁▁▂
Job.SkilledEmployee 0 1 0.63 0.48 0 0.0 1.0 1.00 1 ▅▁▁▁▇
Job.Management.SelfEmp.HighlyQualified 0 1 0.15 0.36 0 0.0 0.0 0.00 1 ▇▁▁▁▂

You could also use the ‘summarytools’ package.

Corrplot and Correlation matrix output.

You might have used different variables. Because they are all scored as between and 0 and 1 you could use them all in these plots.

require(apaTables)
require(dplyr)
cor_table<-dplyr::select(data_credit, c(Age,Amount,Telephone))
apa.cor.table(cor_table, filename = "correlations-credit.doc")
## 
## 
## Means, standard deviations, and correlations with confidence intervals
##  
## 
##   Variable     M       SD      1            2           
##   1. Age       35.55   11.38                            
##                                                         
##   2. Amount    3271.26 2822.74 .03                      
##                                [-.03, .09]              
##                                                         
##   3. Telephone 0.60    0.49    -.15**       -.28**      
##                                [-.21, -.08] [-.33, -.22]
##                                                         
## 
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval.
## The confidence interval is a plausible range of population correlations 
## that could have caused the sample correlation (Cumming, 2014).
##  * indicates p < .05. ** indicates p < .01.
## 
# ggcorrplot, you can then further tweak this, as it is a ggplot.
require(ggcorrplot)
## Loading required package: ggcorrplot
cormatrix<-cor(cor_table)
ggcorrplot(cormatrix, hc.order = TRUE, type = "lower", lab = TRUE)

Hierarchical logistic regression

We built 4 hierarchical models.

levels(data_credit$Class) # Bad is the reference category.
## [1] "Bad"  "Good"
data_credit$Class_ref_good<-relevel(data_credit$Class, ref="Good") # recode, as this is perhaps more sensible if we want to know who will default.
log1<-glm(Class_ref_good~ Amount, data=data_credit, family = binomial)
log2<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly, data=data_credit, family = binomial)
log3<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly+Age, data=data_credit, family = binomial)
log4<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly+Age+Telephone, data=data_credit, family = binomial)
summary(log1)
## 
## Call:
## glm(formula = Class_ref_good ~ Amount, family = binomial, data = data_credit)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.229e+00  1.083e-01 -11.348  < 2e-16 ***
## Amount       1.119e-04  2.355e-05   4.751 2.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1199.1  on 998  degrees of freedom
## AIC: 1203.1
## 
## Number of Fisher Scoring iterations: 4
summary(log2)
## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly, 
##     family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.383e+00  1.390e-01  -9.949  < 2e-16 ***
## Amount                  1.162e-04  2.376e-05   4.890 1.01e-06 ***
## CreditHistory.PaidDuly  2.579e-01  1.415e-01   1.823   0.0683 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1195.7  on 997  degrees of freedom
## AIC: 1201.7
## 
## Number of Fisher Scoring iterations: 4
summary(log3)
## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly + 
##     Age, family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -7.000e-01  2.730e-01  -2.564  0.01035 *  
## Amount                  1.194e-04  2.394e-05   4.988 6.11e-07 ***
## CreditHistory.PaidDuly  1.975e-01  1.436e-01   1.375  0.16917    
## Age                    -1.889e-02  6.623e-03  -2.852  0.00434 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1187.2  on 996  degrees of freedom
## AIC: 1195.2
## 
## Number of Fisher Scoring iterations: 4
summary(log4)
## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly + 
##     Age + Telephone, family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.030e+00  3.123e-01  -3.298 0.000972 ***
## Amount                  1.357e-04  2.518e-05   5.388 7.14e-08 ***
## CreditHistory.PaidDuly  1.942e-01  1.440e-01   1.349 0.177475    
## Age                    -1.686e-02  6.659e-03  -2.531 0.011359 *  
## Telephone               3.383e-01  1.543e-01   2.193 0.028315 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1182.3  on 995  degrees of freedom
## AIC: 1192.3
## 
## Number of Fisher Scoring iterations: 4

Export the results.

This exports the results. Note that the odds ratio of amount is close to 1. This is because the base unit is 1 DM (Deutsche Mark) - which might not make a lot of sense! (Rescore via log10 or standardize?)

require(stargazer)
stargazer(log1,log2,log3,log4,dep.var.labels=c("Credit status (reference= Good)"), covariate.labels=c("Amount (in DM)", "Credit History (Duly paid)","Age", "Telephone (yes/no)"), report = "vc*", omit=c("Constant"), apply.coef=exp, t.auto=F, p.auto=F, style="asr", out="logistic_base_exercise4.htm", header=FALSE)
## 
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lcccc} 
## \\[-1.8ex]\hline \\[-1.8ex] 
## \\[-1.8ex] & \multicolumn{4}{c}{Credit status (reference= Good)} \\ 
## \\[-1.8ex] & (1) & (2) & (3) & (4)\\ 
## \hline \\[-1.8ex] 
##  Amount (in DM) & 1.000$^{***}$ & 1.000$^{***}$ & 1.000$^{***}$ & 1.000$^{***}$ \\ 
##   & & & & \\ 
##  Credit History (Duly paid) &  & 1.294 & 1.218 & 1.214 \\ 
##   & & & & \\ 
##  Age &  &  & 0.981$^{**}$ & 0.983$^{*}$ \\ 
##   & & & & \\ 
##  Telephone (yes/no) &  &  &  & 1.402$^{*}$ \\ 
##   & & & & \\ 
## \textit{N} & 1,000 & 1,000 & 1,000 & 1,000 \\ 
## Log Likelihood & $-$599.532 & $-$597.860 & $-$593.606 & $-$591.163 \\ 
## AIC & 1,203.064 & 1,201.721 & 1,195.213 & 1,192.326 \\ 
## \hline \\[-1.8ex] 
## \multicolumn{5}{l}{$^{*}$p $<$ .05; $^{**}$p $<$ .01; $^{***}$p $<$ .001} \\ 
## \end{tabular} 
## \end{table}

Note that a value <1 means that odds will become lower. For ease of interpretation one would take the inverse of the odds ratio.

Here are the Pseudo- \(R^2\) values.

require(rcompanion)
## Loading required package: rcompanion
nagelkerke(log1, restrictNobs=T)
## $Models
##                                                             
## Model: "glm, Class_ref_good ~ Amount, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"       
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0185513
## Cox and Snell (ML)                  0.0224098
## Nagelkerke (Cragg and Uhler)        0.0317743
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -1     -11.332 22.665 1.9288e-06
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"
nagelkerke(log2, restrictNobs=T)
## $Models
##                                                                                      
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0212879
## Cox and Snell (ML)                  0.0256727
## Nagelkerke (Cragg and Uhler)        0.0364008
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -2     -13.004 26.008 2.2513e-06
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"
nagelkerke(log3, restrictNobs=T)
## $Models
##                                                                                            
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly + Age, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                      
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0282516
## Cox and Snell (ML)                  0.0339269
## Nagelkerke (Cragg and Uhler)        0.0481042
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -3     -17.258 34.516 1.5417e-07
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"
nagelkerke(log4, restrictNobs=T)
## $Models
##                                                                                                        
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly + Age + Telephone, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                                  
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0322516
## Cox and Snell (ML)                  0.0386365
## Nagelkerke (Cragg and Uhler)        0.0547819
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -4     -19.701 39.403 5.7518e-08
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

Mosaic plot.

Here is an example of a mosaic plot. It becomes a bit more difficult with more variables added.

require(RColorBrewer)
## Loading required package: RColorBrewer
# pick some colours.
colours<-c(color = brewer.pal(2, "Set2"))
## Warning in brewer.pal(2, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
fill_colours<-rep(colours, each=2) # 4 quadrants to fill
require(vcd)
## Loading required package: vcd
## Loading required package: grid
## 
## Attaching package: 'vcd'
## The following object is masked from 'package:mosaic':
## 
##     mplot
mosaicplot1<-mosaic(~ Class_ref_good + Telephone, zero_size = 0, data=data_credit, gp = gpar(fill = fill_colours, col = 0), legend=F, labeling = labeling_values)

Another example, now with rental status and owning a telephone. You can further tinker with settings and variables. You can even have tests already included, explore the vcd package vignette for more!.

require(RColorBrewer)
# pick some colours.
colours<-c(color = brewer.pal(2, "Set2"))
## Warning in brewer.pal(2, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
fill_colours<-rep(colours, each=4) # 4 quadrants to fill
require(vcd)
mosaicplot2<-mosaic(~ Class_ref_good + Housing.Rent + Telephone, zero_size = 0, data=data_credit, gp = gpar(fill = fill_colours, col = 0), legend=F, labeling = labeling_values)

Odds ratios.

A shift of a 1,000 DM increases the odds of bad credit (instead of good credit) with a factor 1.145 (while keeping everything else constant). (Note that this model is based on people of an average age, who do not own a phone and who did not duly pay).

require(oddsratio)
## Loading required package: oddsratio
require(dplyr)
require(questionr)
## Loading required package: questionr
## 
## Attaching package: 'questionr'
## The following object is masked from 'package:mosaic':
## 
##     prop
data_credit<- data_credit %>% mutate(Age_cent=Age-mean(Age), Amount_cent= Amount-mean(Amount))
log_4_cent<- glm(Class_ref_good~ Amount_cent+CreditHistory.PaidDuly+Age_cent+Telephone, data=data_credit, family = binomial)
odds.ratio(log_4_cent)
## Waiting for profiling to be done...
##                             OR   2.5 % 97.5 %         p    
## (Intercept)            0.30559 0.22964 0.4027 < 2.2e-16 ***
## Amount_cent            1.00014 1.00009 1.0002 7.142e-08 ***
## CreditHistory.PaidDuly 1.21432 0.91630 1.6119   0.17747    
## Age_cent               0.98328 0.97031 0.9960   0.01136 *  
## Telephone              1.40250 1.03885 1.9027   0.02832 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# shift of 1,000 DM -- everything else put to 0
or_glm(data_credit, incr= list(Amount_cent=1000, CreditHistory.PaidDuly=0, Age_cent=0, Telephone=0), log_4_cent)
##                predictor oddsratio ci_low (2.5) ci_high (97.5) increment
## 1            Amount_cent     1.145        1.091          1.204      1000
## 2 CreditHistory.PaidDuly     1.000        1.000          1.000         0
## 3               Age_cent     1.000        1.000          1.000         0
## 4              Telephone     1.000        1.000          1.000         0
# alternative, which gaves same core resuls (other coeff's not interpretable)
exp(log_4_cent$coefficients*1000)
##            (Intercept)            Amount_cent CreditHistory.PaidDuly 
##           0.000000e+00           1.145310e+00           2.147984e+84 
##               Age_cent              Telephone 
##           4.774967e-08          7.982351e+146

BONUS: ggmosaic()

Facetted by Housing: rental status. Could be further improved by changing 0’s and 1’s.

You might run into errors, see this. The solution is to load the package ‘devtools’ and install the github version of ggmosaic, this line will do the trick: install_github(‘cran/ggmosaic’).

require(ggmosaic) # install development version as ggplot came with major updates breaking old ggmosaic.
## Loading required package: ggmosaic
## 
## Attaching package: 'ggmosaic'
## The following objects are masked from 'package:vcd':
## 
##     mosaic, spine
require(ggthemes)
## Loading required package: ggthemes
## 
## Attaching package: 'ggthemes'
## The following object is masked from 'package:mosaic':
## 
##     theme_map
ggplot(data=data_credit) +
  geom_mosaic(aes(x=product(Class_ref_good,Telephone), fill=Class_ref_good)) + facet_grid(Housing.Rent~.) + labs(x='Telephone') + scale_fill_brewer(palette="Dark2") + theme_gdocs()
## Warning: `unite_()` was deprecated in tidyr 1.2.0.
## ℹ Please use `unite()` instead.
## ℹ The deprecated feature was likely used in the ggmosaic package.
##   Please report the issue at <https://github.com/haleyjeppson/ggmosaic>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

BONUS: ROC Curve.

ROC curves are supercool and they are a tool used across a broad range of sciences. You can read up on them in the Agresti reference from the reading list. You can also read more in the reading lists or here. Anyway, have a look what’s out there and try to have a go at making one yourself.

require(pROC)
## Loading required package: pROC
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:mosaic':
## 
##     cov, var
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
resRoc <- roc(data_credit$Class_ref_good ~ log4$fitted)
## Setting levels: control = Good, case = Bad
## Setting direction: controls < cases
plot(resRoc, legacy.axes = TRUE)

auc(resRoc)
## Area under the curve: 0.616

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] pROC_1.18.5        ggthemes_4.2.4     ggmosaic_0.3.3     questionr_0.7.8   
##  [5] oddsratio_2.0.1    vcd_1.4-11         RColorBrewer_1.1-3 rcompanion_2.4.34 
##  [9] ggcorrplot_0.1.4.1 caret_6.0-94       QuantPsyc_1.6      MASS_7.3-60       
## [13] purrr_1.0.2        boot_1.3-28.1      mosaic_1.9.0       mosaicData_0.20.4 
## [17] ggformula_0.12.0   Matrix_1.6-1.1     ggplot2_3.4.4      lattice_0.21-9    
## [21] sandwich_3.0-2     lmtest_0.9-40      zoo_1.8-12         apaTables_2.0.8   
## [25] stargazer_5.2.3    skimr_2.1.5        dplyr_1.1.3        carData_3.0-5     
## 
## loaded via a namespace (and not attached):
##   [1] libcoin_1.0-10       rstudioapi_0.15.0    jsonlite_1.8.7      
##   [4] magrittr_2.0.3       TH.data_1.1-2        modeltools_0.2-23   
##   [7] farver_2.1.1         rmarkdown_2.25       vctrs_0.6.4         
##  [10] base64enc_0.1-3      htmltools_0.5.7      forcats_1.0.0       
##  [13] haven_2.5.3          broom_1.0.5          cellranger_1.1.0    
##  [16] sass_0.4.7           parallelly_1.36.0    bslib_0.5.1         
##  [19] htmlwidgets_1.6.2    plyr_1.8.9           plotly_4.10.3       
##  [22] rootSolve_1.8.2.4    lubridate_1.9.3      cachem_1.0.8        
##  [25] mime_0.12            lifecycle_1.0.3      iterators_1.0.14    
##  [28] pkgconfig_2.0.3      R6_2.5.1             fastmap_1.1.1       
##  [31] shiny_1.7.5.1        future_1.33.0        digest_0.6.33       
##  [34] Exact_3.2            colorspace_2.1-0     productplots_0.1.1  
##  [37] labeling_0.4.3       fansi_1.0.5          timechange_0.2.0    
##  [40] httr_1.4.7           mgcv_1.9-0           compiler_4.3.2      
##  [43] proxy_0.4-27         withr_2.5.2          backports_1.4.1     
##  [46] highr_0.10           lava_1.7.3           gld_2.6.6           
##  [49] ModelMetrics_1.2.2.2 tools_4.3.2          httpuv_1.6.12       
##  [52] future.apply_1.11.1  nnet_7.3-19          glue_1.6.2          
##  [55] promises_1.2.1       nlme_3.1-163         reshape2_1.4.4      
##  [58] generics_0.1.3       recipes_1.0.9        gtable_0.3.4        
##  [61] MBESS_4.9.3          nortest_1.0-4        labelled_2.12.0     
##  [64] class_7.3-22         tidyr_1.3.0          data.table_1.14.8   
##  [67] lmom_3.0             hms_1.1.3            coin_1.4-3          
##  [70] utf8_1.2.4           ggrepel_0.9.4        foreach_1.5.2       
##  [73] pillar_1.9.0         stringr_1.5.0        later_1.3.1         
##  [76] splines_4.3.2        survival_3.5-7       tidyselect_1.2.0    
##  [79] miniUI_0.1.1.1       knitr_1.45           stats4_4.3.2        
##  [82] xfun_0.41            expm_0.999-9         hardhat_1.3.0       
##  [85] mosaicCore_0.9.4.0   timeDate_4022.108    matrixStats_1.0.0   
##  [88] stringi_1.7.12       lazyeval_0.2.2       yaml_2.3.7          
##  [91] evaluate_0.23        codetools_0.2-19     tibble_3.2.1        
##  [94] multcompView_0.1-9   cli_3.6.1            rpart_4.1.21        
##  [97] xtable_1.8-4         DescTools_0.99.52    repr_1.1.6          
## [100] munsell_0.5.0        jquerylib_0.1.4      Rcpp_1.0.11         
## [103] readxl_1.4.3         globals_0.16.2       parallel_4.3.2      
## [106] ellipsis_0.3.2       gower_1.0.1          listenv_0.9.0       
## [109] viridisLite_0.4.2    mvtnorm_1.2-3        ipred_0.9-14        
## [112] scales_1.2.1         prodlim_2023.08.28   ggridges_0.5.5      
## [115] e1071_1.7-13         crayon_1.5.2         rlang_1.1.1         
## [118] multcomp_1.4-25