Solution Lecture 4

Exercise 4 instructions.

Load last week’s Salaries dataset. If necessary, calculate monthly_salary again.
calculate a Spearman and Pearson correlation between years since Ph.D. and monthly salary. Conduct a significance test for one of them. Pick one and discuss the outcome
Build a hierarchical regression model, in step 1 use monthly salary as dependent and year since Ph.D. as independent, in step 2 add gender (keep all variables from step 1), in step 3 add rank (keep all variables from step 2). Export the hierarchical regression table.
Test the assumptions for the final model, how would you combat any issues that arise in terms of violations of these assumptions?
Calculate the standardized coefficients for model 1 ($\beta$). Interpret this coefficient. Bootstrap this coefficient and report as you would do in a paper.
What does the final model predict in terms of monthly salary shift, when there is a shift of being 3 years post-Ph.D.
Install and load the ‘caret’ package. load the GermanCredit dataset. ??GermanCredit to find out more also here. It contains data on credit scores for a 1,000 cases and has 62 variables. As done before, you might want to explore the descriptive statistics.
Make a corrplot() for up to 10 continuous variables in the GermanCredit dataset. Export a correlation matrix (up to 10 variables).
Conduct a hierarchical logistic regression on ‘Class’ as dependent, in each step keeping the previous variables. In step 1 add Amount. In step 2 add CreditHistory.PaidDuly. In step 3 add Age. In step 4 add Telephone. Export a table with your key results.
Choose some categorical variables and make a (beautiful) mosaic plot with credit class as dependent.
Calculate all odds ratios for model 4. Calculate the odds ratio for a shift in 1,000 DM (Deutsche Mark - pre-euro era) in credit status for model 4.
BONUS: Make a mosaic plot via ggmosaic() with credit class as dependent.
BONUS: Have a look at ROC curves and plot one for the credit data.

Back to the salaries data.

setwd("~/Dropbox/Teaching_MRes_Northumbria/Lecture4")
require(carData)

## Loading required package: carData

salaries<-carData::Salaries
require(dplyr)

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

salaries<- salaries %>% mutate(monthly_salary=salary/9)
require(skimr)

## Loading required package: skimr

skim(salaries)

Data summary
Name	salaries
Number of rows	397
Number of columns	7
_______________________
Column type frequency:
factor	3
numeric	4
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
rank	1	FALSE	3	Pro: 266, Ass: 67, Ass: 64
discipline	1	FALSE	2	B: 216, A: 181
sex	1	FALSE	2	Mal: 358, Fem: 39

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
yrs.since.phd	1	22.31	12.89	1.00	12.00	21.00	32.00	56.00	▇▇▆▅▁
yrs.service	1	17.61	13.01	0.00	7.00	16.00	27.00	60.00	▇▅▃▂▁
salary	1	113706.46	30289.04	57800.00	91000.00	107300.00	134185.00	231545.00	▅▇▅▂▁
monthly_salary	1	12634.05	3365.45	6422.22	10111.11	11922.22	14909.44	25727.22	▅▇▅▂▁

Correlation coefficients.

cor(salaries$yrs.since.phd, salaries$monthly_salary)

## [1] 0.4192311

cor(salaries$yrs.since.phd, salaries$monthly_salary, method='spearman')

## [1] 0.4755676

Significance tests for correlations.

Here I have also calculated the 99% confidence interval.

cor.test(salaries$yrs.since.phd, salaries$monthly_salary, conf.level = .99)

## 
##  Pearson's product-moment correlation
## 
## data:  salaries$yrs.since.phd and salaries$monthly_salary
## t = 9.1775, df = 395, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 99 percent confidence interval:
##  0.3067831 0.5201368
## sample estimates:
##       cor 
## 0.4192311

cor.test(salaries$yrs.since.phd, salaries$monthly_salary, method = 'spearman', conf.level = .99)

## Warning in cor.test.default(salaries$yrs.since.phd, salaries$monthly_salary, :
## Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  salaries$yrs.since.phd and salaries$monthly_salary
## S = 5468989, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4755676

Sample write up: Years since Ph.D. and monthly salary were positively and moderately related ($r_s$= .476, p<.0001).

hierarchical OLS regression.

Build the models.

# rename sex to gender
colnames(salaries)[5] <-"gender"
# Linear models.
lm_1<-lm(monthly_salary ~ yrs.since.phd, data= salaries)
lm_2<-lm(monthly_salary ~ yrs.since.phd + gender, data= salaries)
lm_3<-lm(monthly_salary ~ yrs.since.phd + gender + rank, data= salaries)
summary(lm_1)

## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd, data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9352.3 -2159.1  -317.6  1787.4 11375.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   10190.97     307.31  33.162   <2e-16 ***
## yrs.since.phd   109.48      11.93   9.177   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3059 on 395 degrees of freedom
## Multiple R-squared:  0.1758, Adjusted R-squared:  0.1737 
## F-statistic: 84.23 on 1 and 395 DF,  p-value: < 2.2e-16

summary(lm_2)

## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender, data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9351.9 -2192.8  -283.5  1714.2 11336.9 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9464.65     527.59  17.939   <2e-16 ***
## yrs.since.phd   106.45      12.04   8.845   <2e-16 ***
## genderMale      880.40     520.45   1.692   0.0915 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3052 on 394 degrees of freedom
## Multiple R-squared:  0.1817, Adjusted R-squared:  0.1775 
## F-statistic: 43.74 on 2 and 394 DF,  p-value: < 2.2e-16

summary(lm_3)

## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender + rank, 
##     data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8549.39     495.08  17.269  < 2e-16 ***
## yrs.since.phd   -10.23      14.41  -0.710  0.47801    
## genderMale      571.85     448.76   1.274  0.20332    
## rankAssocProf  1556.98     482.53   3.227  0.00136 ** 
## rankProf       5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16

Note that in the final model years since Ph.D. is no longer statistically significant. It is important to note that we have not yet looked into the assumptions or made graphs.

Export the models.

I chose American Sociological review.

require(stargazer)

## Loading required package: stargazer

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

stargazer(lm_1,lm_2,lm_3,  dep.var.labels=c("Monthly Salary (USD)"),covariate.labels=c("Years since Ph.D.","Gender: Male", "Rank: Associate Prof.", "Rank: Prof."), style="asr", out="hierarchical_exercise4_stargazer.htm", header=F)

## 
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lccc} 
## \\[-1.8ex]\hline \\[-1.8ex] 
## \\[-1.8ex] & \multicolumn{3}{c}{Monthly Salary (USD)} \\ 
## \\[-1.8ex] & (1) & (2) & (3)\\ 
## \hline \\[-1.8ex] 
##  Years since Ph.D. & 109.482$^{***}$ & 106.453$^{***}$ & $-$10.234 \\ 
##   Gender: Male &  & 880.403 & 571.845 \\ 
##   Rank: Associate Prof. &  &  & 1,556.978$^{**}$ \\ 
##   Rank: Prof. &  &  & 5,292.878$^{***}$ \\ 
##   Constant & 10,190.970$^{***}$ & 9,464.646$^{***}$ & 8,549.387$^{***}$ \\ 
##  \textit{N} & 397 & 397 & 397 \\ 
## R$^{2}$ & 0.176 & 0.182 & 0.397 \\ 
## Adjusted R$^{2}$ & 0.174 & 0.178 & 0.391 \\ 
## Residual Std. Error & 3,059.287 (df = 395) & 3,052.104 (df = 394) & 2,625.931 (df = 392) \\ 
## F Statistic & 84.226$^{***}$ (df = 1; 395) & 43.742$^{***}$ (df = 2; 394) & 64.613$^{***}$ (df = 4; 392) \\ 
## \hline \\[-1.8ex] 
## \multicolumn{4}{l}{$^{*}$p $<$ .05; $^{**}$p $<$ .01; $^{***}$p $<$ .001} \\ 
## \end{tabular} 
## \end{table}

You can also rely on APA Tables. But I found it buggy when combining multiple models (issues with rbind()). (There is also apaStyle but that does not play nice with RJava on Mac). Here is an example.

require(apaTables)

## Loading required package: apaTables

apa.reg.table(lm_1,filename = 'APA_exercise4.doc',table.number = 1)

## 
## 
## Table 1 
## 
## Regression results using monthly_salary as the criterion
##  
## 
##      Predictor          b            b_95%_CI beta  beta_95%_CI sr2 sr2_95%_CI
##    (Intercept) 10190.97** [9586.80, 10795.13]                                 
##  yrs.since.phd   109.48**     [86.03, 132.94] 0.42 [0.33, 0.51] .18 [.11, .24]
##                                                                               
##                                                                               
##                                                                               
##      r             Fit
##                       
##  .42**                
##            R2 = .176**
##        95% CI[.11,.24]
##                       
## 
## Note. A significant b-weight indicates the beta-weight and semi-partial correlation are also significant.
## b represents unstandardized regression weights. beta indicates the standardized regression weights. 
## sr2 represents the semi-partial correlation squared. r represents the zero-order correlation.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## * indicates p < .05. ** indicates p < .01.
##

Assumptions.

Linearity.

Note that here we only checked the assumptions for the final model.

plot(lm_3, which = 1)

Plot 1: Not curvilinear but also not very nicely distributed. Not wholly linear as we would want it. The data shows clumping, likely as there are pay grades. There are no data between 11,000-13,000 USD and we linearly interpolate. That could be wrong - as in reality people move up in steps, something our model does not account for. While one would loose information, a solution could be to explore ordinal methods (incl. tobit regression). These more advanced methods can handle censoring issues. There is a trade-off between keeping things ‘simple’(- in reality most relationships are not wholly linear) and easily interpretable versus correctly describing the data (but also running the risk of overfitting). Within its remits we can still use our model, however - bearing in mind that it is a linear representation of the relationships between variables.

Normality of residuals.

plot(lm_3, which=2)

Plot 2: Generally OK but at the top end our model ‘derails’. Also one extreme case (44), we could consider running the analysis with or without that case. Another potential solution is to log-transform our data. What happens then?

lm_3_log<-lm(log10(monthly_salary) ~ yrs.since.phd + gender + rank, data= salaries)
plot(lm_3_log, which=2)

This looks much nicer, does it not! But we should bear in mind that a shift in X now represents a shift in log10(Y) which might make it harder to directly interpret the coefficients (but less of a worry with distribution of the residuals now.)

Homoscedasticity.

plot(lm_3, which=3)

Plot 3: Ideally, we would want a straight line here. That is not the case. Again notice the issues with the gap in X. The ‘funnel’ is widening when X is increasing. So, this could point to issues with heteroscedasticity. The plot is slightly better for our log-transformed data. A solution is to work with Heteroscedasticity consistent errors.

plot(lm_3_log, which=3)

A Breusch-Pagan test shows that in both cases, there would be issues with the assumption of homoscedasticity. (Also it suggests that the problem is actually the same, if not stronger, in the ‘log’ model).

require(lmtest)

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

bptest(lm_3)

## 
##  studentized Breusch-Pagan test
## 
## data:  lm_3
## BP = 59.457, df = 4, p-value = 3.772e-12

bptest(lm_3_log)

## 
##  studentized Breusch-Pagan test
## 
## data:  lm_3_log
## BP = 62.284, df = 4, p-value = 9.599e-13

Cook’s distance.

Cook’s distance gives an indication of influential cases in the regression model. Some put the cut-off at 4/(N - k -1) – (check here). But best to just check the plots.

plot(lm_3, which=4)

Plot 4: The plot is largely fine, there are some influential cases (351, 283), perhaps one would ignore 331 as it is close to other peaks. One could run the analyses with and without those cases. Alternatively, one could bootstrap the relevant statistics, in case of worry about robustness. (Or rely on ‘robust’ techniques - an example below).

Incidentally in our transformed model, 283 (and perhaps 351) are again potential ‘monkey wrenches’ you could consider again to bootstrap your models, or present models with and without those ‘outliers’.

plot(lm_3_log, which=4)

Autocorrelation.

In this case both models exhibit autocorrelation.

require(lmtest)
dwtest(lm_3)

## 
##  Durbin-Watson test
## 
## data:  lm_3
## DW = 1.7452, p-value = 0.005363
## alternative hypothesis: true autocorrelation is greater than 0

dwtest(lm_3_log)

## 
##  Durbin-Watson test
## 
## data:  lm_3_log
## DW = 1.6538, p-value = 0.0002655
## alternative hypothesis: true autocorrelation is greater than 0

Heteroscedasticty and autocorrelation corrected errors.

Not that I expect you to run this but here is an example of a model which correctly deals with this issue. Notice the difference it makes for the Gender coefficient’s SE and significance level. Whereas before we might have ruled out the existence of an effect, this model suggests a trend (still not strong evidence though).

require(sandwich)

## Loading required package: sandwich

summary(lm_3)

## 
## Call:
## lm(formula = monthly_salary ~ yrs.since.phd + gender + rank, 
##     data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8549.39     495.08  17.269  < 2e-16 ***
## yrs.since.phd   -10.23      14.41  -0.710  0.47801    
## genderMale      571.85     448.76   1.274  0.20332    
## rankAssocProf  1556.98     482.53   3.227  0.00136 ** 
## rankProf       5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16

vcv <- vcovHAC(lm_3) # variance covariance matrix (which we might want to store)
coeftest(lm_3, vcov = vcovHAC(lm_3))

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)   8549.387    302.314 28.2798 < 2.2e-16 ***
## yrs.since.phd  -10.234     17.789 -0.5753   0.56543    
## genderMale     571.845    321.820  1.7769   0.07636 .  
## rankAssocProf 1556.978    270.749  5.7506 1.791e-08 ***
## rankProf      5292.878    404.665 13.0796 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$\beta$ for model 1

Here we calculate the $\beta$’s for model 1. I used dplyr to z-score (and the mosaic package to get an alternative method for z-scoring as scale() can run into problems).

require(dplyr)
require(mosaic)

## Loading required package: mosaic

## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2

## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.

## 
## Attaching package: 'mosaic'

## The following object is masked from 'package:Matrix':
## 
##     mean

## The following object is masked from 'package:ggplot2':
## 
##     stat

## The following object is masked from 'package:skimr':
## 
##     n_missing

## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally

## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var

## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum

salaries <- salaries %>% mutate(monthly_salary_stand=zscore(monthly_salary), yrs.since.phd_stand=zscore(yrs.since.phd))
lm_stand1<- lm(monthly_salary_stand ~ yrs.since.phd_stand-1, data= salaries)
summary(lm_stand1)

## 
## Call:
## lm(formula = monthly_salary_stand ~ yrs.since.phd_stand - 1, 
##     data = salaries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7789 -0.6415 -0.0944  0.5311  3.3802 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## yrs.since.phd_stand  0.41923    0.04562   9.189   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9079 on 396 degrees of freedom
## Multiple R-squared:  0.1758, Adjusted R-squared:  0.1737 
## F-statistic: 84.44 on 1 and 396 DF,  p-value: < 2.2e-16

# alternative
require(QuantPsyc)

## Loading required package: QuantPsyc

## Loading required package: boot

## 
## Attaching package: 'boot'

## The following object is masked from 'package:mosaic':
## 
##     logit

## The following object is masked from 'package:lattice':
## 
##     melanoma

## Loading required package: purrr

## 
## Attaching package: 'purrr'

## The following object is masked from 'package:mosaic':
## 
##     cross

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## 
## Attaching package: 'QuantPsyc'

## The following object is masked from 'package:Matrix':
## 
##     norm

## The following object is masked from 'package:base':
## 
##     norm

lm.beta(lm_1)

## yrs.since.phd 
##     0.4192311

The $\beta$ is 0.42 this implies that a shift in one standard deviation of years since Ph.D. results in a shift of .42 standard deviation. (Notice how this is the same value as your Pearson correlation coefficient!)

Bootstrap

# Bootstrap 95% CI for B coefficient.
library(boot)
# function to obtain p.values from F-test from the data
coeff_B <- function(data, indices) {
data_boot <- data[indices, ] # allows boot to select sample
B_boot <- lm(monthly_salary_stand ~ yrs.since.phd_stand-1, data = data_boot)
return(B_boot$coeff[1]) # position of years 
}
# bootstrapping with 10000 replications
set.seed(1983) # slightly worse year.
results_coeff_B <- boot(data = salaries, statistic = coeff_B, R = 10000)

Now let us calculate the confidence interval.

plot(results_coeff_B)

# get 95% confidence interval 
boot.ci(results_coeff_B, type='bca')

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results_coeff_B, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.3085,  0.5205 )  
## Calculations and Intervals on Original Scale

Sample write up: The 95% confidence interval for $\beta$ using 10,000 samples and bias-corrected accelerated bootstrapping is [.31, .52]. (If you had done a bootstrap of the Pearson correlation coefficient you should get similar values).

Model prediction.

Best to center everything. No point in centering categorical variables, note that we are calculating a shift here based on the reference categories (women and asst profs.). A shift of 3 years is associated with around $31 less a month, all else being equal. Now that is not very impressive, and that’s in part explained by the fact that academic rank does so much of the ‘heavy lifting’ and we are now looking at a shift away from the means (!). For a shift at the start of the career, you could use the initial model. A shift from asst. to associate professor implies another $1557 a month (all else being equal). (For men add another $572 a month (!))

require(dplyr)
# create centered variables
salaries<-dplyr::mutate(salaries, monthly_salary_cent=monthly_salary-mean(monthly_salary),yrs.since.phd_cent=yrs.since.phd-mean(yrs.since.phd))
# lm.
lm_3_cent<-lm(monthly_salary_cent ~ yrs.since.phd_cent + gender + rank, data=salaries)
summary(lm_3_cent)

## 
## Call:
## lm(formula = monthly_salary_cent ~ yrs.since.phd_cent + gender + 
##     rank, data = salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7470  -1704   -170   1351  11702 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -4313.03     564.20  -7.645 1.63e-13 ***
## yrs.since.phd_cent   -10.23      14.41  -0.710  0.47801    
## genderMale           571.85     448.76   1.274  0.20332    
## rankAssocProf       1556.98     482.53   3.227  0.00136 ** 
## rankProf            5292.88     490.30  10.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2626 on 392 degrees of freedom
## Multiple R-squared:  0.3973, Adjusted R-squared:  0.3912 
## F-statistic: 64.61 on 4 and 392 DF,  p-value: < 2.2e-16

# a shift of 3 for all coefficients
3*lm_3_cent$coefficients

##        (Intercept) yrs.since.phd_cent         genderMale      rankAssocProf 
##       -12939.08800          -30.70138         1715.53631         4670.93252 
##           rankProf 
##        15878.63255

German credit data.

require(caret)

## Loading required package: caret

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

## The following object is masked from 'package:mosaic':
## 
##     dotPlot

data(GermanCredit)
data_credit<-GermanCredit
require(skimr)
skim(data_credit)

Data summary
Name	data_credit
Number of rows	1000
Number of columns	62
_______________________
Column type frequency:
factor	1
numeric	61
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
Class	0	1	FALSE	2	Goo: 700, Bad: 300

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Duration	1	20.90	12.06	4	12.0	18.0	24.00	72	▇▇▂▁▁
Amount	1	3271.26	2822.74	250	1365.5	2319.5	3972.25	18424	▇▂▁▁▁
InstallmentRatePercentage	1	2.97	1.12	1	2.0	3.0	4.00	4	▂▃▁▂▇
ResidenceDuration	1	2.85	1.10	1	2.0	3.0	4.00	4	▂▆▁▃▇
Age	1	35.55	11.38	19	27.0	33.0	42.00	75	▇▆▃▁▁
NumberExistingCredits	1	1.41	0.58	1	1.0	1.0	2.00	4	▇▅▁▁▁
NumberPeopleMaintenance	1	1.16	0.36	1	1.0	1.0	1.00	2	▇▁▁▁▂
Telephone	1	0.60	0.49	0	0.0	1.0	1.00	1	▆▁▁▁▇
ForeignWorker	1	0.96	0.19	0	1.0	1.0	1.00	1	▁▁▁▁▇
CheckingAccountStatus.lt.0	1	0.27	0.45	0	0.0	0.0	1.00	1	▇▁▁▁▃
CheckingAccountStatus.0.to.200	1	0.27	0.44	0	0.0	0.0	1.00	1	▇▁▁▁▃
CheckingAccountStatus.gt.200	1	0.06	0.24	0	0.0	0.0	0.00	1	▇▁▁▁▁
CheckingAccountStatus.none	1	0.39	0.49	0	0.0	0.0	1.00	1	▇▁▁▁▅
CreditHistory.NoCredit.AllPaid	1	0.04	0.20	0	0.0	0.0	0.00	1	▇▁▁▁▁
CreditHistory.ThisBank.AllPaid	1	0.05	0.22	0	0.0	0.0	0.00	1	▇▁▁▁▁
CreditHistory.PaidDuly	1	0.53	0.50	0	0.0	1.0	1.00	1	▇▁▁▁▇
CreditHistory.Delay	1	0.09	0.28	0	0.0	0.0	0.00	1	▇▁▁▁▁
CreditHistory.Critical	1	0.29	0.46	0	0.0	0.0	1.00	1	▇▁▁▁▃
Purpose.NewCar	1	0.23	0.42	0	0.0	0.0	0.00	1	▇▁▁▁▂
Purpose.UsedCar	1	0.10	0.30	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Furniture.Equipment	1	0.18	0.39	0	0.0	0.0	0.00	1	▇▁▁▁▂
Purpose.Radio.Television	1	0.28	0.45	0	0.0	0.0	1.00	1	▇▁▁▁▃
Purpose.DomesticAppliance	1	0.01	0.11	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Repairs	1	0.02	0.15	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Education	1	0.05	0.22	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Vacation	1	0.00	0.00	0	0.0	0.0	0.00	0	▁▁▇▁▁
Purpose.Retraining	1	0.01	0.09	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Business	1	0.10	0.30	0	0.0	0.0	0.00	1	▇▁▁▁▁
Purpose.Other	1	0.01	0.11	0	0.0	0.0	0.00	1	▇▁▁▁▁
SavingsAccountBonds.lt.100	1	0.60	0.49	0	0.0	1.0	1.00	1	▅▁▁▁▇
SavingsAccountBonds.100.to.500	1	0.10	0.30	0	0.0	0.0	0.00	1	▇▁▁▁▁
SavingsAccountBonds.500.to.1000	1	0.06	0.24	0	0.0	0.0	0.00	1	▇▁▁▁▁
SavingsAccountBonds.gt.1000	1	0.05	0.21	0	0.0	0.0	0.00	1	▇▁▁▁▁
SavingsAccountBonds.Unknown	1	0.18	0.39	0	0.0	0.0	0.00	1	▇▁▁▁▂
EmploymentDuration.lt.1	1	0.17	0.38	0	0.0	0.0	0.00	1	▇▁▁▁▂
EmploymentDuration.1.to.4	1	0.34	0.47	0	0.0	0.0	1.00	1	▇▁▁▁▅
EmploymentDuration.4.to.7	1	0.17	0.38	0	0.0	0.0	0.00	1	▇▁▁▁▂
EmploymentDuration.gt.7	1	0.25	0.43	0	0.0	0.0	1.00	1	▇▁▁▁▃
EmploymentDuration.Unemployed	1	0.06	0.24	0	0.0	0.0	0.00	1	▇▁▁▁▁
Personal.Male.Divorced.Seperated	1	0.05	0.22	0	0.0	0.0	0.00	1	▇▁▁▁▁
Personal.Female.NotSingle	1	0.31	0.46	0	0.0	0.0	1.00	1	▇▁▁▁▃
Personal.Male.Single	1	0.55	0.50	0	0.0	1.0	1.00	1	▆▁▁▁▇
Personal.Male.Married.Widowed	1	0.09	0.29	0	0.0	0.0	0.00	1	▇▁▁▁▁
Personal.Female.Single	1	0.00	0.00	0	0.0	0.0	0.00	0	▁▁▇▁▁
OtherDebtorsGuarantors.None	1	0.91	0.29	0	1.0	1.0	1.00	1	▁▁▁▁▇
OtherDebtorsGuarantors.CoApplicant	1	0.04	0.20	0	0.0	0.0	0.00	1	▇▁▁▁▁
OtherDebtorsGuarantors.Guarantor	1	0.05	0.22	0	0.0	0.0	0.00	1	▇▁▁▁▁
Property.RealEstate	1	0.28	0.45	0	0.0	0.0	1.00	1	▇▁▁▁▃
Property.Insurance	1	0.23	0.42	0	0.0	0.0	0.00	1	▇▁▁▁▂
Property.CarOther	1	0.33	0.47	0	0.0	0.0	1.00	1	▇▁▁▁▃
Property.Unknown	1	0.15	0.36	0	0.0	0.0	0.00	1	▇▁▁▁▂
OtherInstallmentPlans.Bank	1	0.14	0.35	0	0.0	0.0	0.00	1	▇▁▁▁▁
OtherInstallmentPlans.Stores	1	0.05	0.21	0	0.0	0.0	0.00	1	▇▁▁▁▁
OtherInstallmentPlans.None	1	0.81	0.39	0	1.0	1.0	1.00	1	▂▁▁▁▇
Housing.Rent	1	0.18	0.38	0	0.0	0.0	0.00	1	▇▁▁▁▂
Housing.Own	1	0.71	0.45	0	0.0	1.0	1.00	1	▃▁▁▁▇
Housing.ForFree	1	0.11	0.31	0	0.0	0.0	0.00	1	▇▁▁▁▁
Job.UnemployedUnskilled	1	0.02	0.15	0	0.0	0.0	0.00	1	▇▁▁▁▁
Job.UnskilledResident	1	0.20	0.40	0	0.0	0.0	0.00	1	▇▁▁▁▂
Job.SkilledEmployee	1	0.63	0.48	0	0.0	1.0	1.00	1	▅▁▁▁▇
Job.Management.SelfEmp.HighlyQualified	1	0.15	0.36	0	0.0	0.0	0.00	1	▇▁▁▁▂

You could also use the ‘summarytools’ package.

Corrplot and Correlation matrix output.

You might have used different variables. Because they are all scored as between and 0 and 1 you could use them all in these plots.

require(apaTables)
require(dplyr)
cor_table<-dplyr::select(data_credit, c(Age,Amount,Telephone))
apa.cor.table(cor_table, filename = "correlations-credit.doc")

## 
## 
## Means, standard deviations, and correlations with confidence intervals
##  
## 
##   Variable     M       SD      1            2           
##   1. Age       35.55   11.38                            
##                                                         
##   2. Amount    3271.26 2822.74 .03                      
##                                [-.03, .09]              
##                                                         
##   3. Telephone 0.60    0.49    -.15**       -.28**      
##                                [-.21, -.08] [-.33, -.22]
##                                                         
## 
## Note. M and SD are used to represent mean and standard deviation, respectively.
## Values in square brackets indicate the 95% confidence interval.
## The confidence interval is a plausible range of population correlations 
## that could have caused the sample correlation (Cumming, 2014).
##  * indicates p < .05. ** indicates p < .01.
##

# ggcorrplot, you can then further tweak this, as it is a ggplot.
require(ggcorrplot)

## Loading required package: ggcorrplot

cormatrix<-cor(cor_table)
ggcorrplot(cormatrix, hc.order = TRUE, type = "lower", lab = TRUE)

Hierarchical logistic regression

We built 4 hierarchical models.

levels(data_credit$Class) # Bad is the reference category.

## [1] "Bad"  "Good"

data_credit$Class_ref_good<-relevel(data_credit$Class, ref="Good") # recode, as this is perhaps more sensible if we want to know who will default.
log1<-glm(Class_ref_good~ Amount, data=data_credit, family = binomial)
log2<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly, data=data_credit, family = binomial)
log3<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly+Age, data=data_credit, family = binomial)
log4<-glm(Class_ref_good~ Amount+CreditHistory.PaidDuly+Age+Telephone, data=data_credit, family = binomial)
summary(log1)

## 
## Call:
## glm(formula = Class_ref_good ~ Amount, family = binomial, data = data_credit)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.229e+00  1.083e-01 -11.348  < 2e-16 ***
## Amount       1.119e-04  2.355e-05   4.751 2.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1199.1  on 998  degrees of freedom
## AIC: 1203.1
## 
## Number of Fisher Scoring iterations: 4

summary(log2)

## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly, 
##     family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.383e+00  1.390e-01  -9.949  < 2e-16 ***
## Amount                  1.162e-04  2.376e-05   4.890 1.01e-06 ***
## CreditHistory.PaidDuly  2.579e-01  1.415e-01   1.823   0.0683 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1195.7  on 997  degrees of freedom
## AIC: 1201.7
## 
## Number of Fisher Scoring iterations: 4

summary(log3)

## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly + 
##     Age, family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -7.000e-01  2.730e-01  -2.564  0.01035 *  
## Amount                  1.194e-04  2.394e-05   4.988 6.11e-07 ***
## CreditHistory.PaidDuly  1.975e-01  1.436e-01   1.375  0.16917    
## Age                    -1.889e-02  6.623e-03  -2.852  0.00434 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1187.2  on 996  degrees of freedom
## AIC: 1195.2
## 
## Number of Fisher Scoring iterations: 4

summary(log4)

## 
## Call:
## glm(formula = Class_ref_good ~ Amount + CreditHistory.PaidDuly + 
##     Age + Telephone, family = binomial, data = data_credit)
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.030e+00  3.123e-01  -3.298 0.000972 ***
## Amount                  1.357e-04  2.518e-05   5.388 7.14e-08 ***
## CreditHistory.PaidDuly  1.942e-01  1.440e-01   1.349 0.177475    
## Age                    -1.686e-02  6.659e-03  -2.531 0.011359 *  
## Telephone               3.383e-01  1.543e-01   2.193 0.028315 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1221.7  on 999  degrees of freedom
## Residual deviance: 1182.3  on 995  degrees of freedom
## AIC: 1192.3
## 
## Number of Fisher Scoring iterations: 4

Export the results.

This exports the results. Note that the odds ratio of amount is close to 1. This is because the base unit is 1 DM (Deutsche Mark) - which might not make a lot of sense! (Rescore via log10 or standardize?)

require(stargazer)
stargazer(log1,log2,log3,log4,dep.var.labels=c("Credit status (reference= Good)"), covariate.labels=c("Amount (in DM)", "Credit History (Duly paid)","Age", "Telephone (yes/no)"), report = "vc*", omit=c("Constant"), apply.coef=exp, t.auto=F, p.auto=F, style="asr", out="logistic_base_exercise4.htm", header=FALSE)

## 
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lcccc} 
## \\[-1.8ex]\hline \\[-1.8ex] 
## \\[-1.8ex] & \multicolumn{4}{c}{Credit status (reference= Good)} \\ 
## \\[-1.8ex] & (1) & (2) & (3) & (4)\\ 
## \hline \\[-1.8ex] 
##  Amount (in DM) & 1.000$^{***}$ & 1.000$^{***}$ & 1.000$^{***}$ & 1.000$^{***}$ \\ 
##   & & & & \\ 
##  Credit History (Duly paid) &  & 1.294 & 1.218 & 1.214 \\ 
##   & & & & \\ 
##  Age &  &  & 0.981$^{**}$ & 0.983$^{*}$ \\ 
##   & & & & \\ 
##  Telephone (yes/no) &  &  &  & 1.402$^{*}$ \\ 
##   & & & & \\ 
## \textit{N} & 1,000 & 1,000 & 1,000 & 1,000 \\ 
## Log Likelihood & $-$599.532 & $-$597.860 & $-$593.606 & $-$591.163 \\ 
## AIC & 1,203.064 & 1,201.721 & 1,195.213 & 1,192.326 \\ 
## \hline \\[-1.8ex] 
## \multicolumn{5}{l}{$^{*}$p $<$ .05; $^{**}$p $<$ .01; $^{***}$p $<$ .001} \\ 
## \end{tabular} 
## \end{table}

Note that a value <1 means that odds will become lower. For ease of interpretation one would take the inverse of the odds ratio.

Here are the Pseudo- $R^2$ values.

require(rcompanion)

## Loading required package: rcompanion

nagelkerke(log1, restrictNobs=T)

## $Models
##                                                             
## Model: "glm, Class_ref_good ~ Amount, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"       
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0185513
## Cox and Snell (ML)                  0.0224098
## Nagelkerke (Cragg and Uhler)        0.0317743
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -1     -11.332 22.665 1.9288e-06
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

nagelkerke(log2, restrictNobs=T)

## $Models
##                                                                                      
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0212879
## Cox and Snell (ML)                  0.0256727
## Nagelkerke (Cragg and Uhler)        0.0364008
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -2     -13.004 26.008 2.2513e-06
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

nagelkerke(log3, restrictNobs=T)

## $Models
##                                                                                            
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly + Age, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                      
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0282516
## Cox and Snell (ML)                  0.0339269
## Nagelkerke (Cragg and Uhler)        0.0481042
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -3     -17.258 34.516 1.5417e-07
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

nagelkerke(log4, restrictNobs=T)

## $Models
##                                                                                                        
## Model: "glm, Class_ref_good ~ Amount + CreditHistory.PaidDuly + Age + Telephone, binomial, data_credit"
## Null:  "glm, Class_ref_good ~ 1, binomial, fit$model"                                                  
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0322516
## Cox and Snell (ML)                  0.0386365
## Nagelkerke (Cragg and Uhler)        0.0547819
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq    p.value
##       -4     -19.701 39.403 5.7518e-08
## 
## $Number.of.observations
##            
## Model: 1000
## Null:  1000
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

Mosaic plot.

Here is an example of a mosaic plot. It becomes a bit more difficult with more variables added.

require(RColorBrewer)

## Loading required package: RColorBrewer

# pick some colours.
colours<-c(color = brewer.pal(2, "Set2"))

## Warning in brewer.pal(2, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

fill_colours<-rep(colours, each=2) # 4 quadrants to fill
require(vcd)

## Loading required package: vcd

## Loading required package: grid

## 
## Attaching package: 'vcd'

## The following object is masked from 'package:mosaic':
## 
##     mplot

mosaicplot1<-mosaic(~ Class_ref_good + Telephone, zero_size = 0, data=data_credit, gp = gpar(fill = fill_colours, col = 0), legend=F, labeling = labeling_values)

Another example, now with rental status and owning a telephone. You can further tinker with settings and variables. You can even have tests already included, explore the vcd package vignette for more!.

require(RColorBrewer)
# pick some colours.
colours<-c(color = brewer.pal(2, "Set2"))

## Warning in brewer.pal(2, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

fill_colours<-rep(colours, each=4) # 4 quadrants to fill
require(vcd)
mosaicplot2<-mosaic(~ Class_ref_good + Housing.Rent + Telephone, zero_size = 0, data=data_credit, gp = gpar(fill = fill_colours, col = 0), legend=F, labeling = labeling_values)

Odds ratios.

A shift of a 1,000 DM increases the odds of bad credit (instead of good credit) with a factor 1.145 (while keeping everything else constant). (Note that this model is based on people of an average age, who do not own a phone and who did not duly pay).

require(oddsratio)

## Loading required package: oddsratio

require(dplyr)
require(questionr)

## Loading required package: questionr

## 
## Attaching package: 'questionr'

## The following object is masked from 'package:mosaic':
## 
##     prop

data_credit<- data_credit %>% mutate(Age_cent=Age-mean(Age), Amount_cent= Amount-mean(Amount))
log_4_cent<- glm(Class_ref_good~ Amount_cent+CreditHistory.PaidDuly+Age_cent+Telephone, data=data_credit, family = binomial)
odds.ratio(log_4_cent)

## Waiting for profiling to be done...

# shift of 1,000 DM -- everything else put to 0
or_glm(data_credit, incr= list(Amount_cent=1000, CreditHistory.PaidDuly=0, Age_cent=0, Telephone=0), log_4_cent)

# alternative, which gaves same core resuls (other coeff's not interpretable)
exp(log_4_cent$coefficients*1000)

##            (Intercept)            Amount_cent CreditHistory.PaidDuly 
##           0.000000e+00           1.145310e+00           2.147984e+84 
##               Age_cent              Telephone 
##           4.774967e-08          7.982351e+146

BONUS: ggmosaic()

Facetted by Housing: rental status. Could be further improved by changing 0’s and 1’s.

You might run into errors, see this. The solution is to load the package ‘devtools’ and install the github version of ggmosaic, this line will do the trick: install_github(‘cran/ggmosaic’).

require(ggmosaic) # install development version as ggplot came with major updates breaking old ggmosaic.

## Loading required package: ggmosaic

## 
## Attaching package: 'ggmosaic'

## The following objects are masked from 'package:vcd':
## 
##     mosaic, spine

require(ggthemes)

## Loading required package: ggthemes

## 
## Attaching package: 'ggthemes'

## The following object is masked from 'package:mosaic':
## 
##     theme_map

ggplot(data=data_credit) +
  geom_mosaic(aes(x=product(Class_ref_good,Telephone), fill=Class_ref_good)) + facet_grid(Housing.Rent~.) + labs(x='Telephone') + scale_fill_brewer(palette="Dark2") + theme_gdocs()

## Warning: The `scale_name` argument of `continuous_scale()` is deprecated as of ggplot2
## 3.5.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: `unite_()` was deprecated in tidyr 1.2.0.
## ℹ Please use `unite()` instead.
## ℹ The deprecated feature was likely used in the ggmosaic package.
##   Please report the issue at <https://github.com/haleyjeppson/ggmosaic>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

BONUS: ROC Curve.

ROC curves are supercool and they are a tool used across a broad range of sciences. You can read up on them in the Agresti reference from the reading list. You can also read more in the reading lists or here. Anyway, have a look what’s out there and try to have a go at making one yourself.

require(pROC)

## Loading required package: pROC

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:mosaic':
## 
##     cov, var

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

resRoc <- roc(data_credit$Class_ref_good ~ log4$fitted)

## Setting levels: control = Good, case = Bad

## Setting direction: controls < cases

plot(resRoc, legacy.axes = TRUE)

auc(resRoc)

## Area under the curve: 0.616

The end

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] pROC_1.18.5        ggthemes_5.1.0     ggmosaic_0.3.3     questionr_0.7.8   
##  [5] oddsratio_2.0.1    vcd_1.4-13         RColorBrewer_1.1-3 rcompanion_2.4.36 
##  [9] ggcorrplot_0.1.4.1 caret_7.0-1        QuantPsyc_1.6      MASS_7.3-61       
## [13] purrr_1.0.2        boot_1.3-31        mosaic_1.9.1       mosaicData_0.20.4 
## [17] ggformula_0.12.0   Matrix_1.7-1       ggplot2_3.5.1      lattice_0.22-6    
## [21] sandwich_3.1-1     lmtest_0.9-40      zoo_1.8-12         apaTables_2.0.8   
## [25] stargazer_5.2.3    skimr_2.1.5        dplyr_1.1.4        carData_3.0-5     
## 
## loaded via a namespace (and not attached):
##   [1] libcoin_1.0-10       rstudioapi_0.17.1    jsonlite_1.8.9      
##   [4] magrittr_2.0.3       TH.data_1.1-2        modeltools_0.2-23   
##   [7] farver_2.1.2         rmarkdown_2.28       vctrs_0.6.5         
##  [10] base64enc_0.1-3      htmltools_0.5.8.1    forcats_1.0.0       
##  [13] haven_2.5.4          broom_1.0.7          cellranger_1.1.0    
##  [16] sass_0.4.9           parallelly_1.38.0    bslib_0.8.0         
##  [19] htmlwidgets_1.6.4    plyr_1.8.9           plotly_4.10.4       
##  [22] rootSolve_1.8.2.4    lubridate_1.9.3      cachem_1.1.0        
##  [25] mime_0.12            lifecycle_1.0.4      iterators_1.0.14    
##  [28] pkgconfig_2.0.3      R6_2.5.1             fastmap_1.2.0       
##  [31] shiny_1.9.1          future_1.34.0        digest_0.6.37       
##  [34] Exact_3.3            colorspace_2.1-1     productplots_0.1.1  
##  [37] labeling_0.4.3       fansi_1.0.6          timechange_0.3.0    
##  [40] httr_1.4.7           mgcv_1.9-1           compiler_4.4.2      
##  [43] proxy_0.4-27         withr_3.0.2          backports_1.5.0     
##  [46] highr_0.11           lava_1.8.0           gld_2.6.6           
##  [49] ModelMetrics_1.2.2.2 tools_4.4.2          httpuv_1.6.15       
##  [52] future.apply_1.11.3  nnet_7.3-19          glue_1.8.0          
##  [55] promises_1.3.0       nlme_3.1-166         reshape2_1.4.4      
##  [58] generics_0.1.3       recipes_1.1.0        gtable_0.3.6        
##  [61] MBESS_4.9.3          nortest_1.0-4        labelled_2.13.0     
##  [64] class_7.3-22         tidyr_1.3.1          data.table_1.16.2   
##  [67] lmom_3.2             hms_1.1.3            coin_1.4-3          
##  [70] utf8_1.2.4           ggrepel_0.9.6        foreach_1.5.2       
##  [73] pillar_1.9.0         stringr_1.5.1        later_1.3.2         
##  [76] splines_4.4.2        survival_3.7-0       tidyselect_1.2.1    
##  [79] miniUI_0.1.1.1       knitr_1.49           stats4_4.4.2        
##  [82] xfun_0.49            expm_1.0-0           hardhat_1.4.0       
##  [85] mosaicCore_0.9.4.0   timeDate_4041.110    matrixStats_1.4.1   
##  [88] stringi_1.8.4        lazyeval_0.2.2       yaml_2.3.10         
##  [91] evaluate_1.0.1       codetools_0.2-20     tibble_3.2.1        
##  [94] multcompView_0.1-10  cli_3.6.3            rpart_4.1.23        
##  [97] xtable_1.8-4         DescTools_0.99.58    repr_1.1.7          
## [100] munsell_0.5.1        jquerylib_0.1.4      Rcpp_1.0.13         
## [103] readxl_1.4.3         globals_0.16.3       parallel_4.4.2      
## [106] gower_1.0.2          listenv_0.9.1        viridisLite_0.4.2   
## [109] mvtnorm_1.3-1        ipred_0.9-15         scales_1.3.0        
## [112] prodlim_2024.06.25   ggridges_0.5.6       e1071_1.7-16        
## [115] crayon_1.5.3         rlang_1.1.4          multcomp_1.4-26