Questions.

  1. The ‘T-Swift’ protein was quantified in the blood of the two groups. Women were found to have higher levels of the protein ( M = 1.062, SD = 0.339, N = 60) than men ( M = 0.528; SD = 0.382, N = 60). Calculate the raw difference effect size (D) and its variance + SE.

  2. Calculate the Cohen’s d for the example above and its variance and SE. Calculate Hedges’ g and its SE.

  3. Convert the Cohen’s d you calculated in step 3 to a Pearson r and its SE. The formula for SE is not provided on the slides but you can get it from Borenstein et al. (2009) chapter 7 (see here)

  4. In a study you see the following table for and association between hay fever and eczema in 11 year old children.

Hayfever
Yes No Total
Eczema Yes 141 420 561
No 928 13525 14453
Total 1069 13945 15522

What is the probability that a child with eczema will also have hay fever? (proportion and/or %). For children without eczema, what is the probability of having hay fever (proportion and/or %). What is the risk difference? Look up the formula for the SE of risk difference, here.

Calculate the Odds Ratio. The ln(OR) and its SE.

  1. In a paper, you find a reported risk difference of 12% between men and women in ever having taken LSD. The authors did not report the SE or raw data but did provide the 95 confidence interval fo their estimate (8.2% to 15.8%). Work out the SE for this estimate, so you can use it in your meta-analysis (tip: use the Cochrane Handbook, alternatively use google).

  2. Bonus : You are reading a paper on the effects of an Angiotensin-Converting–Enzyme Inhibitor, Ramipril, on Cardiovascular Events (simply put: heart attacks) in High-Risk Patients and come across some interesting stats. From the table below calculate the ‘Number Needed to Treat’. Calculate the 95% CI of NNT.

Cardiovascular Event
Yes No Total
Ramipril Yes 651 3994 4645
No 826 3826 4652

Proteins,… .

  1. We use the code from the slides… . Subscript t= Women , subscript c = Men. The raw mean difference is .534.
y_t <- 1.062; sd_t <- 0.339; n_t <- 60
y_c <- 0.528; sd_c <- 0.382; n_c <- 60
y_t - y_c ## D
## [1] 0.534

Next we calculate the pooled SD. (0.361).

## If we assume that at population level sd_t = sd_c, then
numerator <- (((n_t - 1)*sd_t^2) + ((n_c - 1)*sd_c^2))
sd_pooled <- sqrt(numerator / (n_t + n_c - 2))
sd_pooled
## [1] 0.3611406

The variance of the raw difference (D) is .004, the standard error is .066. So, for our meta-analysis based on raw differences, we would store .534(+/-.066).

## Variance of D and se
(var_d <- ((n_t + n_c)/(n_t * n_c)) * sd_pooled^2)
## [1] 0.004347417
sqrt(var_d)
## [1] 0.06593494
  1. Now we calculate Cohen’s d and Hedges’g.

This is a large effect size, sensu Cohen (1988), 1.48+/-.21.

## Cohen's d:
d <- (y_t - y_c)/sd_pooled
d
## [1] 1.478649
## Variance of Cohen's d and SE of Cohen's d
var_d <- ((n_t + n_c)/(n_t * n_c)) + ((d^2) / (2 * (n_t + n_c)))
var_d
## [1] 0.04244334
sqrt(var_d) # SE
## [1] 0.2060178

We calculate J the correction factor.

## Hedges' g is based on Cohen's d
## Calculate correction factor J
J <- (1 - (3/(4 * (n_t + n_c - 2) - 1)))
J
## [1] 0.9936306

As you can see Hedges’ g is lower, 1.47+/-0.20, but perhaps regardless this remains a sizable effect based on effect size guidelines.

## So, Hedges' g is
g <- d * J 
g
## [1] 1.469231
## Variance and SE
var_g <- var_d * (J *J)
sqrt(var_g) # SE
## [1] 0.2047056
  1. Remember:

\[r=\frac{d}{\sqrt{d^2+A}}\]

and \[A= \frac{(n_1+n_2)^2}{n_1n_2}\]

whereby A is a correction factor for cases where ‘group sizes’ ( \(n_1\) and \(n_2\)) are not equal. If group sizes are equal we can assume \(n_1=n_2\) and then A=4. This is the case here.

r<-d/(sqrt((d*d)+4))
r
## [1] 0.5944919

You can find the formula at 7.9 (page 5 of here). Remember A=4 when sample sizes are equal.

This gives us Pearson r = .594+/-.054.

var_r<-(16*var_d)/(((d^2)+4)^3)
var_r
## [1] 0.002868238
# Se is
se_r<-sqrt(var_r)
se_r
## [1] 0.05355593

Hayfever… .

  1. The probability that a child with eczema will also have hay fever is 25.1%. The odds is estimated by 141/420. Similarly, for children without eczema the probability of having hay fever is 6.4% and the odds is 928/13525. The risk difference is 18.7%.
141/561
## [1] 0.2513369
928/14453
## [1] 0.06420812
# Risk difference
(141/561)-(928/14453)
## [1] 0.1871288

We can get the standard error (se) for the risk difference as follows:

\[se = \sqrt{\frac{p_1*(1-p_1)}{n_1}+\frac{p_2*(1-p_2)}{n_2}}\]

# note that this overrides our previous code for the hayfever example
prop_1<-(141/561)
one_minus_prop_1<-1-prop_1
prop_2<-(928/14453)
one_minus_prop_2<-1-prop_2
n_1<-561
n_2<-14453
# Following the formula
part1<-(prop_1*one_minus_prop_1)/n_1 #
part2<-(prop_2*one_minus_prop_2)/n_2
se<-sqrt(part1+part2)
se
## [1] 0.01842743

So, for our meta-analysis (based on absolute risk difference), we have an (absolute) risk difference of 18.7% +/- 1.8%.

Now let’s move on to Odds Ratios (OR):

Remember:

Treatment Control
Event \(n_{11}\) \(n_{12}\)
Non-event \(n_{21}\) \(n_{22}\)

\[OR= \frac{n_{11}n_{22}}{n_{12}n_{21}}\]

For simplicity:

a= \(n_{11}\), b = \(n_{12}\), c= \(n_{21}\) , d= \(n_{22}\)

# odds ratio
OR<-(141*13525)/(420*928)
OR
## [1] 4.892819
inv_OR<-1/OR
inv_OR
## [1] 0.2043812

Remember we want to use the ln(OR) as it has better statistical properties… . Note how it behaves when you take the (natural) logarithm of the inverse of our odds ratio.

log(OR)
## [1] 1.587769
log(inv_OR)
## [1] -1.587769

the standard error, will be the square root of the variance, given by this formula

\[Var(ln(OR)) = \frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}\]

variance_OR<-(1/141)+(1/420)+(1/928)+(1/13525)
variance_OR
## [1] 0.01062467
se_OR<-sqrt(variance_OR)
se_OR
## [1] 0.1030761

So the statistics we would need for our meta-analysis based on ln(OR), would be 1.588 +/- .103.

LSD,… .

  1. You can find the formula in chapter 7 of the Cochrane handbook.

\[se = (upper limit – lower limit) / 3.92\]

(15.8-8.2)/3.92
## [1] 1.938776

For our meta-analysis we thus have 12% (+/-1.93%) - Note that you might want to have a think about precision, so one might want to input 12%(+/-2%)… .

Heart attacks,…

  1. Let’s work through this ‘heart attacks’ example,… .

NNT is nothing more or less than 1/(risk difference). Note that we are talking about the absolute risk difference.

# Proportion of patients with CV events in the ramipril group
651/4645
## [1] 0.1401507
# Proportion of patients with CV events in the placebo group
826/4652
## [1] 0.177558

The (absolute) risk difference is the difference between those 2 values (3.74%).

abs((651/4645)-(826/4652))
## [1] 0.03740734

Incidentally the Relative Risk is .789. So patients treated with Ramipril had a lower risk of a Cardiovascular (CV) event. This is a 21% decrease in the risk of CV events (Risk reduction).

# Relative Risk
(651/4645)/(826/4652)
## [1] 0.7893233
# Risk reduction
1-((651/4645)/(826/4652))
## [1] 0.2106767

The NNT is 1/(Risk Difference). A large treatment effect, in the absolute scale, leads to a small number needed to treat (NNT). The best possible treatment would save every life, i.e. NNT=1. A treatment saving one life for every 10 patients treated outperforms a competing treatment that saves just one life for every 50 patients treated. Assuming this study ran for 5 years, this would mean one life saved for every 27 patients (rounded up). Suppose that the study only had 65 events (as opposed to 651) in the Ramipril group and 83 events (as opposed to 826) in the control group. In such a case the NNT would be 260 (!), but the relative risk remains roughly the same .7843 (!).

# NNT
1/abs((651/4645)-(826/4652))
## [1] 26.73272
# Now: fewer events... 
# absolute risk.
abs((65/4645)-(83/4652))
## [1] 0.003848247
# NNT
1/abs((65/4645)-(83/4652))
## [1] 259.8586
# relative risk
(65/4645)/(83/4652)
## [1] 0.7843127

95% CI

The solution lies in calculating the reciprocals (i.e., 1/x) of the 95% CI of the (absolute) risk difference.

As described above, we can get the standard error (se) for the risk difference as follows:

\[se = \sqrt{\frac{p_1*(1-p_1)}{n_1}+\frac{p_2*(1-p_2)}{n_2}}\]

# note that this overrides our previous code for the hayfever example
prop_1<-(651/4645)
one_minus_prop_1<-1-prop_1
prop_2<-(826/4652)
one_minus_prop_2<-1-prop_2
n_1<-4645
n_2<-4652
# Following the formula
part1<-(prop_1*one_minus_prop_1)/n_1 #
part2<-(prop_2*one_minus_prop_2)/n_2
se<-sqrt(part1+part2)

If we assume normality, then +/-1.96*se gives us the 95% confidence interval.

# note rounding below.
abs((651/4645)-(826/4652))+1.96*se
## [1] 0.0522484
abs((651/4645)-(826/4652))-1.96*se
## [1] 0.02256628

So the 95% CI of our absolute risk difference, 3.74%, is [2.26% to 5.22%].

Now we convert those values to NNT.

# rounded.
1/(abs((651/4645)-(826/4652))+1.96*se)
## [1] 19.13934
1/(abs((651/4645)-(826/4652))-1.96*se)
## [1] 44.31391

The 95% CI of the NNT ranges from 19 to 44.

Now imagine the scenario described above: the study only had 65 events (as opposed to 651) in the ramipril group and 83 events (as opposed to 826) in the control group.

# note that this overrides our previous code
prop_1<-(65/4645)
one_minus_prop_1<-1-prop_1
prop_2<-(83/4652)
one_minus_prop_2<-1-prop_2
n_1<-4645
n_2<-4652
# Following the formula
part1<-(prop_1*one_minus_prop_1)/n_1 #
part2<-(prop_2*one_minus_prop_2)/n_2
se<-sqrt(part1+part2)

If we assume normality then +/-1.96*se gives us the 95% confidence interval. This includes 0.

abs((65/4645)-(83/4652)) # risk difference
## [1] 0.003848247
abs((65/4645)-(83/4652))+1.96*se
## [1] 0.008935688
abs((65/4645)-(83/4652))-1.96*se
## [1] -0.001239194

We get a negative number for our 95% CI for NNT, this is peculiar: find about it more here. So we could conclude, NNTB=260 (NNTH= 807 to to \(-infty\) to NNTB= 111). Confused? Read the article by Altman.

1/(abs((65/4645)-(83/4652))+1.96*se)
## [1] 111.9108
1/(abs((65/4645)-(83/4652))-1.96*se)
## [1] -806.9761

The end.

The end.

Acknowledgments and further reading… .

The protein example is from here.

The ‘hayfever exczema’ Odds Ratio example is from here.

The ‘Ramipril’ NNT example is from here. Note that some of the numbers differ due to rounding.

You can find a nice visualisation on Cohen’s d here.

Want to find out more about how d compares to estimates of d , have a look here (note also the correction).

Find out more about odds ratios here, here, here and here. How big is a big odds ratio?

More about Number Needed to Treat (NNT): here, here and here. For time to event data see this. This script seems useful to convert a Cohen’s d into a NNT.

Note that throughout I have varied the rounding when I reported, you should make your own decisions on how precise you believe your results to be.

Cited literature

Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009). Introduction to meta-analysis. New York, NY: John Wiley & Sons.

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  magrittr_1.5    tools_3.5.1     htmltools_0.3.6
##  [5] yaml_2.2.0      Rcpp_1.0.2      stringi_1.4.3   rmarkdown_1.11 
##  [9] knitr_1.24      stringr_1.4.0   xfun_0.9        digest_0.6.20  
## [13] evaluate_0.14