Meta-analysis course: part 3: Fixed vs. random effect meta-analyses

class: center, middle, inverse, title-slide

# Meta-analysis course: part 3: Fixed vs. random effect meta-analyses
### Thomas Pollet (@tvpollet), Northumbria University
### 2019-09-16 | <a href="http://tvpollet.github.io/disclaimer">disclaimer</a>

---

## Outline Course.

* Principles of systematic reviews / meta-analysis

* Effect sizes

* **Fixed vs. random effects meta-analyses** --> effect size distributions.

* Publication bias

* Moderators and metaregression

* Advanced 'stuff'.

---
## Structure of this section

* Problem: how to combine studies
* Models of summarizing effect-size distributions
* Random-effects model
* Fixed-effect model
* Distinction between fixed- and random-effects
* Deciding between fixed vs. random effect models

---
## Modeling study outcomes I

* In this section we discuss how to summarize (combine) effect sizes computed for each study.
* The mean effect size across those studies is computed by applying a weighting scheme.
* The majority of such analyses are weighted least squares (WLS); in a univariate meta-analysis we simply calculate a weighted arithmetic mean.
* Weighting schemes will vary depending on the specific  model that we adopt (and "our assumptions" of the world).

--> _'All models are wrong but some are useful'_ (Box, 1978)

---
## Modeling study outcomes II

- 2 main reasons to use weights in meta-analysis:
  * Assumption of equal variances typically violated (i.e., heteroscedasticity)
  * Larger studies (i.e., those with larger sample sizes) should receive more importance than small studies.

- Criterions for choosing a particular weighting scheme:
  * Statistical test(s).
  * Conceptual/theoretical reasoning.
  * Generalization.

---
## Modeling study outcomes III

* Let `$T_1...T_K$` be estimates of (the observed) effect sizes (e.g, _d_, or Pearson _r_) from _K_ independent studies, where each `$T_i$` is an estimate of the 'true' effect size ( `$\theta_i$`).

* Thus, we will have one effect size ( _Ti_ ) for each study and these effect sizes are considered to be statistically independent (!).

* Our goal is to model the study outcome across those studies (estimate the means of `$\theta_i$` = `$\mu$` ).

???
'true' effect size is the effect size in the underlying population, and is the effect size that we would observe if the study had an infinitely large sample size (and therefore no sampling error)

---
## Modeling study outcomes IV (Borenstein et al. 2009:70)

We estimate its means ( `$\mu$`) , if we had managed to capture the entire 'population', then there would be no variation and we'd have one single point ( `$\theta$` ): the population parameter.

---
## Modeling study outcomes V (Borenstein et al. 2009:70)

The circles represent `$\theta_i$` i.e. `$\theta_1$` , `$\theta_2$` & `$\theta_3$` .

---
## Modeling study outcomes VI (Borenstein et al. 2009:71)

Our observed effect is the square ( `$T_3$` ). `$\epsilon_3$` is _within_ study error, `$\zeta_3$` refers to true variation.

---
## Modeling study outcomes VII (Borenstein et al. 2009:72)

---
## Modeling study outcomes VIIII (Weiss & Daikeler, 2017:191).

---
## Notation

- `$\theta_i$` is the population parameter for study _i_ (and `$T_i$` is its estimate)

- `$\theta$` is the single population parameter ( `$\theta_1 = \theta_2 = ... = \theta$` )

- `$\mu$` is the mean of the effect-size distribution (the mean population effect size) for all `$\theta_i$`s.

- `$e_i$` or `$\epsilon_i$` is within-study error variation

- `${\sigma_i^2}$` is the sample variance for the _i_ th effect size.

- `$u_i$` is the between-study error.

- `$\tau^2$` is the variance of our effect size distribution ( `$Var(\theta_i)$` ).

???
A single population parameter is assumed to be fixed or take only one value. Population parameters are unknown and almost always unknowable, because they "belong" to populations and we almost never observe whole populations. Common population parameters in a study are those used to describe the distributions of variables e.g., the mean `$\mu$`; but in principle we can estimate any parameter we are interested in.

---
## Let's look at it again.

---
## Random effects model I

The random-effects model acknowledges two sources of
variation:
1. within-study sampling error ( dependent on sampling variance `${\sigma_i^2}$` ) and
2. between-studies variability ( `$\tau^2$` ) (e.g., due to varying study characteristics).

The random-effects model can be represented as:

`$$T_i= \overbrace{\mu+u_i}^{\theta_i}+e_i$$`

---
## Let's look at that equation... .

`$$T_i= \overbrace{\mu+u_i}^{\theta_i}+e_i$$`

whereby:

- `$e_i$` is the difference between the true mean `$\theta_i$` for study _i_ and the observed mean effect size `$T_i$` for study _i_ ,
`$(e_i = T_i - \theta_i)$`
- `$u_i$` is the difference between the grand mean `$\mu$` and the true mean for the _ith_ study `$\theta_i$` , `$(u_i = \theta_i - \mu)$`.

- `$e_i\sim~N(0,{\sigma_i^2})$`
- `$u_i\sim~N(0,\tau^2)$`

???
Think of it like this: a large sampling variance means an imprecise estimate of the effect --> lots of error. And you'll recall that variances are a function of N -  so smaller samples tend to be more imprecise.
Suppose that you'd wanted to find the average height of people in Natal, a sample of 25 will be a less precise estimate than one of 2,500.

---
## Random effects model II

* Under random-effects model we have two goals:
  - To estimate the mean population effect size from which the observed studies are sample from.
  - To estimate the between-studies variability `${\tau_i^2}$` .

* In practice we compute `${\sigma_i^2}$` , we treat the within-study error variance as known.

* Thus, under random-effects model the variance of `$T_i$` is equal to `${\sigma_i^2}+\tau^2$`.

---
## Fixed effect model I

* Now imagine a case where there is **no** between-studies heterogeneity, `$\tau^2 = 0$`, the random-effects model reduces to the _fixed-effect
model._

* In such a case, the fixed-effect model has only one source of variation (within-study sampling error `$e_i$`).

* The fixed-effect model can be represented as:

`$$T_i=\theta + e_i$$`

---
## Fixed effect model II

* Recall that `$e_i$` is the error estimate, which is assumed to be normally distributed with a mean of 0 and variance `${\sigma_i^2}$`

* Thus, under a fixed-effect model, the only source of variability is _within-study_ sampling error.

* Under fixed-effect model all studies are modeled as sharing the same effect, i.e. `$\theta_1 = \theta_2 = ...  = \theta$`.

---
## Fixed effect model III (Borenstein et al. 2009: 64)

---
## Fixed and random effects model

* **Fixed-effect**:

`$$Ti = \theta + e_i ;$$`

Each effect size estimates a single mean effect `$\theta$` , and differs from this mean effect by sampling error.

* **Random-effects**:

`$$T_i = \mu + u_i + e_i$$`

Effect size differs from the underlying population mean due to both sampling error and the underlying population variance.

???
i.e. we model the heterogeneity between studies.

---
## Distribution of effect sizes I.

* For fixed-effect models, we estimate a “common” effect.

* For random-effects models, a key difference is that each population may have a different effect, and we estimate the
amount of uncertainty (variation) due to those differences.
we estimate an “average” effect.

---
## Distribution of effect sizes II.

Fixed effects

`$$T_i\sim~N(\theta,{\sigma_i^2})$$`

and weights:

`$$W_i=\frac{1}{\sigma_i^2}$$`

Random effects

`$$T_i\sim~N(\mu,{\sigma_i^2}+\tau^2)$$`

and weights:

`$$W_i^*=\frac{1}{\sigma_i^2+\tau^2}$$`

---
## How do weights differ between fixed and random effect model.

**Thomas opens gosoapbox question.**

---
## Distribution of effect sizes III.

Under the Random effects model and due to

`$$W_i^*=\frac{1}{\sigma_i^2+\tau^2}$$`

larger studies are downweighted and smaller studies are upweighted.

Remember under a random effects model the goal is not to estimate one 'true' effect ( `$\theta$` ), but to estimate the mean of a distribution of effects. As each study provides information about a different effect size, we want to be sure that all these effect sizes are represented in our summary estimate. Therefore, we cannot discount a small study by giving it a very small weight (as in a fixed-effect meta-analysis).

The larger `$\tau^2$` compared to `${\sigma_i^2}$`: larger studies will comparatively loose influence and relatively more influence gain for smaller studies.

--
???
Remember we are trying to estimate the distribution, if there is comparatively a lot of heterogeneity (large `$\tau^2$`), then larger studies will indeed comparatively

---
## Deciding between fixed and random effects models I

The fixed-effects model is considered to be an appropriate choice if the central goal of a meta-analysis is to make inferences **only** about the effect-sizes of the observed effect-size distribution (conditional inference).

As Hedges (2009:38) emphasizes “conditional inference” applies to this collection of studies only. It says nothing about future and past studies or studies which may have already been done but are not included among the observed studies.

---
## Deciding between fixed and random effects models II

According to Mengersen et al. (2013:94): _"the random-effects model is in general conceptually applicable to most meta-analysis settings, apart from very carefully designed and similar experiments."_

However, issues with random effect estimation when N is small and we have few studies (e.g., Röver et al., 2015). Moreover, note that small studies (!), which likely have more bias, tend to receive larger weights. This leads some authors to suggest (e.g., Poole & Greenland, 1999) that we should therefore favour fixed effect models in such contexts.

Read more [here](http://sci-hub.tw/https://ebmh.bmj.com/content/17/2/53.full) and [here](http://sci-hub.tw/https://academic.oup.com/aje/article/150/5/469/123590). Use your judgment,... . (Could opt to present both and let reader decide -- or pre-register your choice.).

---
## Example.

Example data from [here](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/Meta_Analysis_Data.xlsx) (Harrer et al., 2019).

These are data on mindfulness interventions.

```r
library(readxl) # read in Excel data
library(tidyverse)
madata<-read_xlsx('Meta_Analysis_Data.xlsx')
head(madata) # Quick browse
```

```
## # A tibble: 6 x 17
## Author TE seTE RoB Control `intervention d… `intervention t…
## <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> 
## 1 Call … 0.709 0.261 low WLC short mindfulness 
## 2 Cavan… 0.355 0.196 low WLC short mindfulness 
## 3 Danit… 1.79 0.346 high WLC short ACT 
## 4 de Vi… 0.182 0.118 low no int… short mindfulness 
## 5 Frazi… 0.422 0.145 low inform… short PCI 
## 6 Froge… 0.63 0.196 low no int… short ACT 
## # … with 10 more variables: population <chr>, `type of students` <chr>,
## # `prevention type` <chr>, gender <chr>, `mode of delivery` <chr>, `ROB
## # streng` <chr>, `ROB superstreng` <chr>, compensation <chr>,
## # instruments <chr>, guidance <chr>
```

---
## Meta::metagen

The effect sizes are already calculated, so we can rely on the metagen function from [meta](https://cran.r-project.org/web/packages/meta/index.html) (Schwarzer et al., 2015).

<table>
 <thead>
 <tr>
 <th style="text-align:left;"> Parameter </th>
 <th style="text-align:left;"> Function </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> TE </td>
 <td style="text-align:left;"> This tells R to use the TE column to retrieve the effect sizes for each study </td>
 </tr>
 <tr>
 <td style="text-align:left;"> seTE </td>
 <td style="text-align:left;"> This tells R to use the seTE column to retrieve the standard error for each study </td>
 </tr>
 <tr>
 <td style="text-align:left;"> data= </td>
 <td style="text-align:left;"> After =, paste the name of your dataset here </td>
 </tr>
 <tr>
 <td style="text-align:left;"> studlab=paste() </td>
 <td style="text-align:left;"> This tells the function were the labels for each study are stored. If you named the spreadsheet columns as advised, this should be studlab=paste(Author) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> comb.fixed= </td>
 <td style="text-align:left;"> Whether to use a fixed-effect-model </td>
 </tr>
 <tr>
 <td style="text-align:left;"> comb.random </td>
 <td style="text-align:left;"> Whether to use a random-effects-model </td>
 </tr>
 <tr>
 <td style="text-align:left;"> prediction= </td>
 <td style="text-align:left;"> Whether to print a prediction interval for the effect of future studies based on present evidence </td>
 </tr>
 <tr>
 <td style="text-align:left;"> sm= </td>
 <td style="text-align:left;"> The summary measure we want to calculate. We can either calculate the mean difference (MD) or Hedges' g/Cohen's d (SMD) </td>
 </tr>
</tbody>
</table>

???
Table gives most important parameters for this function.

---
## Fixed effect meta-analysis

```r
library(meta)
library(metafor)# we'll use this later
Fixed<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.fixed = TRUE,
 comb.random = FALSE,
 prediction=TRUE,
 sm="SMD")
```

---
## Result

```r
Fixed
```

```
## SMD 95%-CI %W(fixed)
## Call et al. 0.7091 [ 0.1979; 1.2203] 3.6
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.3
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 2.0
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 17.5
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 11.6
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.3
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 4.8
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 5.5
## Hintz et al. 0.2840 [-0.0453; 0.6133] 8.6
## Kang et al. 1.2751 [ 0.6142; 1.9360] 2.1
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.4
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 4.6
## Phang et al. 0.5407 [ 0.0619; 1.0196] 4.1
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 3.6
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 2.0
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 2.4
## SongLindquist 0.6126 [ 0.1683; 1.0569] 4.7
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 3.9
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI z p-value
## Fixed effect model 0.4805 [ 0.3840; 0.5771] 9.75 < 0.0001
## Prediction interval [-0.0344; 1.1826] 
## 
## Quantifying heterogeneity:
## tau^2 = 0.0752; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Details on meta-analytical method:
## - Inverse variance method
```

---
## Too much output!

```r
sink("Fixed_results.txt")
print(Fixed)
sink()
```

---
## Output.

In the results of our Meta-Analysis, we find:

* The **individual effect sizes** for each study, and their weight
* The total **number of included studies** (k)
* The **overall effect** (in our case, *g* = 0.4805) and its 95% confidence interval and p-value
* Measures of **between-study heterogeneity**, such as * `$\tau^2$` * or * `$I^2$` * and a *Q*-test of heterogeneity. --> We'll return to these later.

---
## Random effect meta-analysis via meta::metagen

Similar code to what we had for fixed effects but we have to specify our estimators for `$\tau^2$`

---
## Estimators of `$\large\tau^2$`

<table>
 <thead>
 <tr>
 <th style="text-align:left;"> Code </th>
 <th style="text-align:left;"> Estimator </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> DL </td>
 <td style="text-align:left;"> DerSimonian-Laird </td>
 </tr>
 <tr>
 <td style="text-align:left;"> PM </td>
 <td style="text-align:left;"> Paule-Mandel </td>
 </tr>
 <tr>
 <td style="text-align:left;"> REML </td>
 <td style="text-align:left;"> Restricted Maximum-Likelihood </td>
 </tr>
 <tr>
 <td style="text-align:left;"> ML </td>
 <td style="text-align:left;"> Maximum-likelihood </td>
 </tr>
 <tr>
 <td style="text-align:left;"> HS </td>
 <td style="text-align:left;"> Hunter-Schmidt </td>
 </tr>
 <tr>
 <td style="text-align:left;"> SJ </td>
 <td style="text-align:left;"> Sidik-Jonkman </td>
 </tr>
 <tr>
 <td style="text-align:left;"> HE </td>
 <td style="text-align:left;"> Hedges </td>
 </tr>
 <tr>
 <td style="text-align:left;"> EB </td>
 <td style="text-align:left;"> Empirical Bayes </td>
 </tr>
</tbody>
</table>

---
## Which estimator of `$\large\tau^2$` to use?

All of these estimators derive `$\tau^{2}$` using a slightly different approach (different pooled effect size and CI)

More or less bias often depends on the context, and parameters (number of studies `$k$`, the number of participants `$n$`, how much `$n$` varies from study to study, and how large `$\tau^{2}$` is).

An overview paper by [Veroniki and colleagues](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4950030/) reviews all measures.

Especially in medical and psychological research, most common **DerSimonian-Laird estimator** . In part due to *RevMan* or *Comprehensive Meta-Analysis* (older versions) which only relied this estimator. It is also the default option in our `meta` package in R.

Simulation studies show **(Restricted) Maximum-Likelihood**, **Sidik-Jonkman** (with Hartung Knapp adjustment), and **Empirical Bayes** estimators have better properties in estimating the between-study variance (Viechtbauer, 2005; Schwarzer et al., 2015; Veroniki et al., 2016).

---
## The Hartung-Knapp-Sidik-Jonkman (HKSJ) method
  
The **DerSimonian-Laird** method is very prone to producing false positives ([Inthout et al., 2014](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-14-25)).

Especially when the **number of studies is small**, and when there is substantial **heterogeneity** (e.g., Makambi, 2004).

--> Quite common in behavioral sciences.

The **Hartung-Knapp-Sidik-Jonkman (HKSJ) method** produces more robust estimates of the variance of the random effects estimator.

The HKSJ usually leads to more **conservative** results, indicated by wider confidence intervals.

---
##  Issues with the Hartung-Knapp-Sidik-Jonkman method

HKSJ method is not uncontroversial. Albeit rarely, when effect sizes are homogeneous (i.e. a 'fixed effect situation') HKSJ can be anti-conservative (Wiksten et al., 2016).

Some authors argue that other (standard) pooling models should also be used **in addition** to the HKSJ as a **sensitivity analysis** ([Wiksten et al., 2016](http://sci-hub.tw/https://onlinelibrary.wiley.com/doi/full/10.1002/sim.6879)).

Jackson and colleagues (2016) present 4 residual concerns with HKSJ method for your to consider (see [here](https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7411))

An alternative is REML (Viechtbauer, 2015; but some prefer Paule-Mandel, Empirical Bayes or just stick with HKSJ, see Schwarzer et al., 2015)

---
## Example of DerSimonian Laird (DL, default).

```r
Random_dl<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.fixed = FALSE,
 comb.random = TRUE,
 hakn = FALSE,
 sm="SMD")
Random_dl
```

```
## SMD 95%-CI %W(random)
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.0
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.3
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 3.7
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 8.0
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 7.4
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.3
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 6.0
## Hintz et al. 0.2840 [-0.0453; 0.6133] 6.9
## Kang et al. 1.2751 [ 0.6142; 1.9360] 3.8
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.3
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.6
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.3
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.1
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 3.6
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.1
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.7
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.2
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI z p-value
## Random effects model 0.5741 [0.4082; 0.7399] 6.78 < 0.0001
## 
## Quantifying heterogeneity:
## tau^2 = 0.0752; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
```

---
## Sink (output)

```r
sink("Random_dl_results.txt")
print(Random_dl)
sink()
```

---
## Example of Hartung-Knapp-Sidik-Jonkman method

```r
model_hksj<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.fixed = FALSE,
 comb.random = TRUE,
 method.tau = "SJ",
 hakn = TRUE,
 prediction=TRUE,
 sm="SMD")
model_hksj
```

```
## SMD 95%-CI %W(random)
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.2
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.1
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 4.2
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 7.1
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 6.8
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.1
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 5.9
## Hintz et al. 0.2840 [-0.0453; 0.6133] 6.5
## Kang et al. 1.2751 [ 0.6142; 1.9360] 4.3
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.1
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.6
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.4
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.3
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 4.1
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.5
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.7
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.4
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI t p-value
## Random effects model 0.5935 [ 0.3891; 0.7979] 6.13 < 0.0001
## Prediction interval [-0.2084; 1.3954] 
## 
## Quantifying heterogeneity:
## tau^2 = 0.1337; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
```

???
Notice hakn=True and method.tau

---
## Exercise.

Depending on where we are at, we'll work on [this exercise](https://tvpollet.github.io/Meta-analysis_3/Exercise_3_questions.html) or the previous one.

---
## Forest plots.

Forest plots commonly used to display and compare effect sizes across studies. Under the assumption of effect-size homogeneity, point estimates are expected to be similar and have overlapping confidence interval bands.

Forest plot allows heterogeneity among effect sizes. 
  
  * Simple forest plots show effect-size estimates with the confidence intervals on one axis and study indicators on the other axis.

* Slightly more sophisticated plots include information about the weights, as well as estimates of overall effect sizes.

So, forest plots: descriptive tools useful for exploring effect-size heterogeneity.

Studies ordered using variables such as N, year of publication, or other study features.

---
## Forest plots II

Fixed-effect estimates are shown as small vertical lines perpendicular to the centers of the horizontal lines (i.e., confidence intervals) for each study. These fixed-effects estimates are surrounded by grey boxes that are proportional to the study’s fixed-effect weight; larger boxes correspond to larger weights.

At the bottom of the figure, both fixed-effect and random-effects means are plotted as diamonds, with the width of the diamond being proportional to amount of variability present in the estimate (i.e., larger diamondimplies more variability)

---
## Example forest plot

```r
forest(model_hksj)
```

![](Meta-analysis_3_files/figure-html/unnamed-chunk-21-1.svg)

---
## Example forest plot.

Look at the shiny-shiny.

A forest plot with a **diamond** (i.e. the overall effect and its confidence interval) and a **prediction interval**.

What is a [prediction interval](https://bmjopen.bmj.com/content/6/7/e010247)?

Plenty of **other parameters** within the `meta::forest` function which we can use to modify the forest plot.

???
95% prediction interval estimates where the true effects are to be expected for 95% of similar (exchangeable) studies that might be conducted in the future. "The 95% prediction interval gives the range in which the point estimate of 95% of future studies will fall, assuming that true effect sizes are **normally** distributed through the domain." Because selection bias is more likely than not, it is recommended to interpret the prediction interval as a description of the range of **observed** effect sizes, _rather than as a prediction of the range of effect sizes that will be observed in future studies_

---
## Parameters

For all settings, type `?meta::forest` in your **console** to see more (or look at manual).

```r
pdf(file='modified_forestplot.pdf',width=10,height=8) 
forest(model_hksj,
       sortvar=TE,
       xlim = c(-0.5,2.5),
       rightlabs = c("g","95% CI","weight"),
       leftlabs = c("Author", "N","Mean","SD","N","Mean","SD"),
       lab.e = "Intervention",
       pooled.totals = FALSE,
       smlab = "",
       text.random = "Overall effect",
       print.tau2 = FALSE,
       col.diamond = "blue",
       col.diamond.lines = "black",
       col.predict = "black",
       print.I2.ci = TRUE,
       digits.sd = 2
)
dev.off
```

???
note redundancy of N, Mean, etc as metagen rather than metacont!

---
## Modified forest plot

<img src="Meta-analysis_3_files/figure-html/unnamed-chunk-23-1.svg" width="500px" style="display: block; margin: auto;" />
---
## Layout types

The `meta::forest` function also has two **Layouts** preinstalled which we can use. Those layouts can be accessed with the `layout=` parameter.

* **"RevMan5"**. Used for Cochrane reviews and generated by *Review Manager 5* .
* **"JAMA"**. According to the guidelines of the *Journal of the American Medical Association* as output (see details [here](https://jamanetwork.com/journals/jama/pages/instructions-for-authors)).

---
## Revman

```r
pdf(file='Revman_forestplot.pdf', width=10,height=8) 
forest(model_hksj,
      layout='RevMan5'
)
dev.off
```

---
## JAMA

```r
pdf(file='JAMA_forestplot.pdf', width=10,height=8) 
forest(model_hksj,
      layout='JAMA'
)
dev.off
```

---
## Saving your forest plot in different formats.

Don't forget dev.off (we are telling R where to print)

More on improving your .svg [here](https://www.smoothterminal.com/articles/svg-output-from-r). Some options for .png below... .

```r
# Scaleable vector graphics
svg(file='forestplot.svg')

# Image (.Png)
png(file='forestplot.png',width = 480, height = 480, units = "px", pointsize = 12,
     bg = "white",  res = NA, …,
    type = c("cairo", "cairo-png", "Xlib", "quartz"), antialias) 
```

---
## Heterogeneity vs. homogeneity.

* Back to heterogeneity vs. homogeneity.

* In a homogeneous distribution, the dispersion of the effect sizes around their mean will not be greater than expected from sampling error alone (Lipsey & Wilson, 2001) (Fixed-effects model).

* If a collection of effect-sizes is determined to possess dispersion beyond what is expected from sampling error, the effect-size distribution is regarded as heterogeneous. This case typically calls for the use of a REM which accounts for effect-size heterogeneity.

--> in the old days one would first calculate this and/or look at the plot and then decide between fixed-effects and random-effects,... . However, consensus seems to be that one should **decide based on theory** before looking at heterogeneity statistics.

---
## Heterogeneity.

Shown you how to pool effect sizes in a meta-analysis. In meta-analytic pooling: **synthesize the effects of many different studies into one single effect**. However, only sensible, if we don't have **Apples and Oranges**.

Example, overall effect in our meta-analysis is **small**, but a few studies report **very high** effect sizes. Such information is lost when we synthesize to one aggregate effect. Very important to know if ***all*** studies yield small effect sizes, or if exceptions exist.

Another example, very **extreme effect sizes** were included in the meta-analysis ( **outliers** ). Such outliers could distort our overall effect --> important to know how our overall effect would have looked without them.

The extent to which effect sizes vary within a meta-analysis is called **heterogeneity** (forest plot).

High heterogeneity could also be caused by two or more **subgroups** each with a different true effect.

From a statistical standpoint, high heterogeneity is also **problematic**. Very high heterogeneity could also studies have nothing in common, and that there is no **"real" true effect behind our data**.

???
no sense to report the pooled effect at all if there is no real effect... .

---
## Understanding heterogeneity I

[Rücker and colleagues (2008)](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-8-79): 3 types of heterogeneity in meta-analyses in a clinical context:

1.  **Clinical baseline heterogeneity**. These are differences between sample characteristics between the studies. For example, Study A included a general population sample, Study B has recruited study participants who were students.

2.  **Statistical heterogeneity**. Statistical heterogeneity we find in our collected effect size data. Such heterogeneity migh be either important from a clinical standpoint (e.g., we don't know if a treatment is very or only marginally effective because the effects vary much from study to study), or from statistical standpoint (because it dilutes the confidence we have in our pooled effect)

3.  **Other sources of heterogeneity**, such as design-related heterogeneity (e.g., RCT vs. observational).

- 1. and 3.: Solution: Restricting the scope of our search for studies to certain well-defined intervention types, populations, and outcomes. (subgroup/meta-regression)

- 2. --> has to be assessed once we conducted the pooling of studies. This is what we focus on here.

---
## Understanding heterogeneity II: 3 key measures.

1. `$\tau^2$` already covered (remember different ways to estimate, e.g., DerSimonian-Laird).

2. Cochran's *Q*-statistic is the **difference between the observed effect sizes and the fixed-effect model estimate** of the effect size, which is then **squared, weighted and summed**.

`$$Q = \sum\limits_{k=1}^K w_k (\hat\theta_k  - \frac{\sum\limits_{k=1}^K w_k \hat\theta_k}{\sum\limits_{k=1}^K w_k})^{2}$$`

3. `$I^{2}$` by [Higgins and Thompson (2002)](https://sci-hub.tw/https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.1186) is the **percentage of variability** in the effect sizes which is not caused by sampling error. It is derived from `$Q$`:
`$$I^{2} = max \left\{0, \frac{Q-(K-1)}{Q}  \right\}$$`

???
Don't panic about the formulae, mostly there for your reference.

---
## Cochran's Q.

A statistically significant _p_-value suggests that the true effects vary but the converse is not true!

A non-significant p-value cannot be taken as evidence that the effect sizes are consistent. It is well-known that the Q test has low statistical power.

With a small number of studies and/or large within-study variance (small studies), even substantial between-studies dispersion might yield a nonsignificant p-value (Borenstein et al., 2009: 115).

The usual `$\alpha$` level for Cochran's Q is 10% (rather than beloved 5%)

--> The decision between the Fixed and the Random model should not be based on Q alone!

---
## What's wrong with `$\large\tau^2$` and Q?

* `$\tau^2$` is not scale invariant (depends on the particular ES), it cannot be compared across meta-analyses.

* Q has low statistical power.

According to Higgins and Thompson (2002), an improved measure should fulfill the following conditions:
- Dependence on the extent of heterogeneity
- Scale invariance
- Size invariance

---
## `$\large{H^2}$` statistic

`$$H^2=\frac{Q}{K-1}$$`

It can be seen as a standardised `$Q$` statistic.

---
## `$\large{I^2}$` Statistic I

`$$I^{2} = max \left\{0, \frac{Q-(K-1)}{Q}  \right\}$$`

The `$I^2$` statistic can be interpreted as the proportion of total variation in the estimates of treatment effect that is due to heterogeneity between studies. It is similar in concept to the [intraclass correlation coefficient (ICC)](https://en.wikipedia.org/wiki/Intraclass_correlation) in multilevel modeling.

---
## `$\large{I^2}$` Statistic II

A **"rule of thumb"** [(Higgins et al., 2003)](https://sci-hub.tw/https://www.bmj.com/content/327/7414/557.short) :
  - `$I^2$` = 25%: **low heterogeneity**
  - `$I^2$` = 50%: **moderate heterogeneity**
  - `$I^2$`  = 75%: **substantial heterogeneity**

---
## `$\large{I^2}$` Statistic III

`$I^2$` is **not an absolute** measure of heterogeneity.

Borenstein et al. (2017:11) : _"In fact, `$I^2$` does not tell us how much the effect size varies. [...] it tells us what proportion of the observed variance would remain if we could eliminate the sampling error – if we could somehow observe the true effect size for all studies in the analysis. `$I^2$` can be used together with the observed effects to give us a sense of the true effects."_

---
## What to report?

Generally, when we assess and report heterogeneity in a meta-analysis, we need a measure which is **robust, and not to easily influenced by statistical power**.

* ** `$\tau^2$`** is **insensitive** to the number of studies, **and** the precision. Yet, it is often hard to interpret how relevant our `$\tau^2$` is from a practical standpoint. Cannot be (easily) compared across studies (as dependent on effect size,... .)

* **Cochran's *Q* ** increases both when the **number of studies** ( `$k$` ) increases, and when the **precision** (i.e., the sample size `$N$` of a study) increases. Therefore, `$Q$` and whether it is **significant** highly depends on the size of your meta-analysis, and thus its statistical power. Therefore, do not only rely on `$Q$` when assessing heterogeneity. (Also note it has low statistical power)

* ** `$I^2$` ** on the other hand, is not sensitive to changes in the number of studies in the analyses. `$I^2$` is therefore used extensively in medical and psychological research.

---
## A note on  `$\large{I^2}$`

However, `$I^2$` not always adequate, because it still heavily depends on the **precision** of the included studies (Rücker et al.,2008). As said before, `$I^{2}$` is simply the amount of variability **not caused by sampling error**. If our studies become increasingly large, this sampling error tends to **zero**, while at the same time, `$I^{2}$` tends to 100% simply because the single studies have greater `$N$`. Only relying on `$I^2$` is therefore not a good option either.

---
## Prediction intervals.

**Prediction intervals** are a good way to overcome this limitation of `$I^{2}$` ([IntHout et al., 2016](http://dx.doi.org/10.1136/bmjopen-2015-010247)).

These take our between-study variance into account.

Again: prediction intervals give us a range for which we can **expect the effect of a future study to fall** based on **our present evidence in the meta-analysis**.

If our prediction interval, for example, lies completely on the positive side favoring the intervention, we can be quite confident to say that **despite varying effects, the intervention might be at least in some way beneficial in all contexts we studied in the future**.

If the interval includes **zero**, then we are less sure about this, but **broad prediction intervals are quite common, especially in medicine and psychology**.

???
Again note that we are assuming here that there is no selection bias and that the distribution is normal.

---
## Technical issues... .

- Extreme N studies

- Negative residual variance

- Effect size independence

---
## Extreme N studies.

* Sometimes, a singular study is dramatically larger than other studies in your meta-analysis.

* This could lead to the large N study completely dominating the results of the meta-analysis

* Two approaches:
  – Examine results with and without the study
  – Weigh the large N study by the median sample size of the other studies, rather than its own sample size

---
## 'Negative Residual variance.'

* Sometimes, you might get an estimate for residual variance that is less than 0.
  
  – How is this possible?

* The value you computed for `$\sigma^2$` is but an estimate. Sometimes, a set of studies will randomly have less sampling error than expected.
  
  - This is most likely when you have a small number of small-N studies

* If this happens, try to manually set the residual variance to 0.

---
## Effect size independence.

* Meta-analysis requires that each effect size comes from a different, independent sample.

* Often, studies report multiple effect sizes for the same construct relationships
  – e.g., imagine a meta-analysis on number of errors made in math exams, some studies will have coding both the amount of errors and the severity of errors.

* Including both will bias meta-analytic means, variances, and standard errors.

---
## Dealing with multiple effect sizes...

From best to worst (Wiernik, 2015:75).

1. Compute composite correlation from study intercorrelation matrix
    - Bonus: compute composite reliability ( `$\alpha$` from intercorrelation matrix or [Mosier reliability](https://cran.r-project.org/web/packages/psychmeta/vignettes/overview.html), Schmidt & Hunter, 2014:446)
2. Compute composite correlation using intercorrelation matrix from another source
3. Choose the one best measure... .
    - Only if substantially better (e.g., the one with the better reliability or based on theory (e.g., implicit vs. explicit measures)), Not 
    based on effect size! Be consistent with coding across studies.
4. Average correlation.

???
Admit guilt, well using suboptimal solutions.

---
## Solution.

Multiple dependent effect sizes: https://cran.r-project.org/web/packages/psychmeta/vignettes/overview.html

Explore this package. It can handle dependencies.

We'll cover an alternative route for handling dependencies as well, when we cover advanced topics (`metaSEM`)

---
## Exercise.

Please see the exercise posted [here](https://tvpollet.github.io/Meta-analysis_3/Exercise_3_questions.html)

---
## Any Questions?

[http://tvpollet.github.io](http://tvpollet.github.io)

Twitter: @tvpollet

---
## Acknowledgments

* Numerous students and colleagues. Any mistakes are my own.

* My colleagues who helped me with regards to meta-analysis Nexhmedin Morina, Stijn Peperkoorn, Gert Stulp, Mirre Simons, Johannes Honekopp.

* [HBES](www.hbes.com) and [LECH](https://www.lechufrn.com/) for funding this workshop. Those who have funded me (not these studies per se): [NWO](www.nwo.nl), [Templeton](www.templeton.org), [NIAS](http://nias.knaw.nl).

* You for listening!

---
## References and further reading (errors = blame RefManageR)

<cite>Aert, R. C. M. van, J. M. Wicherts, and M. A. L. M. van
Assen
(2016).
&ldquo;Conducting Meta-Analyses Based on p Values: Reservations and Recommendations for Applying p-Uniform and p-Curve&rdquo;.
In: Perspectives on Psychological Science 11.5, pp. 713-729.
DOI: <a href="https://doi.org/10.1177/1745691616650874">10.1177/1745691616650874</a>.
eprint: https://doi.org/10.1177/1745691616650874.</cite>

<cite>Aloe, A. M. and C. G. Thompson
(2013).
&ldquo;The Synthesis of Partial Effect Sizes&rdquo;.
In: Journal of the Society for Social Work and Research 4.4, pp. 390-405.
DOI: <a href="https://doi.org/10.5243/jsswr.2013.24">10.5243/jsswr.2013.24</a>.
eprint: https://doi.org/10.5243/jsswr.2013.24.</cite>

<cite>Assink, M. and C. J. Wibbelink
(2016).
&ldquo;Fitting Three-Level Meta-Analytic Models in R: A Step-by-Step Tutorial&rdquo;.
In: The Quantitative Methods for Psychology 12.3, pp. 154-174.
ISSN: 2292-1354.</cite>

<cite>Barendregt, J. J, S. A. Doi, Y. Y. Lee, et al.
(2013).
&ldquo;Meta-Analysis of Prevalence&rdquo;.
In: Journal of Epidemiology and Community Health 67.11, pp. 974-978.
ISSN: 0143-005X.
DOI: <a href="https://doi.org/10.1136/jech-2013-203104">10.1136/jech-2013-203104</a>.</cite>

<cite>Becker, B. J. and M. Wu
(2007).
&ldquo;The Synthesis of Regression Slopes in Meta-Analysis&rdquo;.
In: Statistical science 22.3, pp. 414-429.
ISSN: 0883-4237.</cite>
---
## More refs 1.

<cite>Borenstein, M, L. V. Hedges, J. P. Higgins, et al.
(2009).
Introduction to Meta-Analysis.
John Wiley &amp; Sons.
ISBN: 1-119-96437-7.</cite>

<cite>Burnham, K. P. and D. R. Anderson
(2002).
Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach.
New York, NY: Springer.
ISBN: 0-387-95364-7.</cite>

<cite>Burnham, K. P. and D. R. Anderson
(2004).
&ldquo;Multimodel Inference: Understanding AIC and BIC in Model Selection&rdquo;.
In: Sociological Methods &amp; Research 33.2, pp. 261-304.
ISSN: 0049-1241.
DOI: <a href="https://doi.org/10.1177/0049124104268644">10.1177/0049124104268644</a>.</cite>

<cite>Carter, E. C, F. D. Schönbrodt, W. M. Gervais, et al.
(2019).
&ldquo;Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods&rdquo;.
In: Advances in Methods and Practices in Psychological Science 2.2, pp. 115-144.
DOI: <a href="https://doi.org/10.1177/2515245919847196">10.1177/2515245919847196</a>.</cite>

<cite>Chen, D. D. and K. E. Peace
(2013).
Applied Meta-Analysis with R.
Chapman and Hall/CRC.
ISBN: 1-4665-0600-8.</cite>

---
## More refs 2.

<cite>Cheung, M. W.
(2015a).
&ldquo;metaSEM: An R Package for Meta-Analysis Using Structural Equation Modeling&rdquo;.
In: Frontiers in Psychology 5, p. 1521.
ISSN: 1664-1078.
DOI: <a href="https://doi.org/10.3389/fpsyg.2014.01521">10.3389/fpsyg.2014.01521</a>.</cite>

<cite>Cheung, M. W.
(2015b).
Meta-Analysis: A Structural Equation Modeling Approach.
New York, NY: John Wiley &amp; Sons.
ISBN: 1-119-99343-1.</cite>

<cite>Cooper, H.
(2010).
Research Synthesis and Meta-Analysis: A Step-by-Step Approach.
4th.
Sage publications.
ISBN: 1-4833-4704-4.</cite>

<cite>Cooper, H, L. V. Hedges, and J. C. Valentine
(2009).
The Handbook of Research Synthesis and Meta-Analysis.
New York: Russell Sage Foundation.
ISBN: 1-61044-138-9.</cite>

<cite>Cooper, H. and E. A. Patall
(2009).
&ldquo;The Relative Benefits of Meta-Analysis Conducted with Individual Participant Data versus Aggregated Data.&rdquo;
In: Psychological Methods 14.2, pp. 165-176.
ISSN: 1433806886.
DOI: <a href="https://doi.org/10.1037/a0015565">10.1037/a0015565</a>.</cite>

---
## More refs 3.

<cite>Crawley, M. J.
(2013).
The R Book: Second Edition.
New York, NY: John Wiley &amp; Sons.
ISBN: 1-118-44896-0.</cite>

<cite>Cumming, G.
(2014).
&ldquo;The New Statistics&rdquo;.
In: Psychological Science 25.1, pp. 7-29.
ISSN: 0956-7976.
DOI: <a href="https://doi.org/10.1177/0956797613504966">10.1177/0956797613504966</a>.</cite>

<cite>Dickersin, K.
(2005).
&ldquo;Publication Bias: Recognizing the Problem, Understanding Its Origins and Scope, and Preventing Harm&rdquo;.
In: 
Publication Bias in Meta-Analysis Prevention, Assessment and Adjustments.
Ed. by H. R. Rothstein, A. J. Sutton and M. Borenstein.
Chichester, UK: John Wiley.</cite>

<cite>Fisher, R. A.
(1946).
Statistical Methods for Research Workers.
10th ed.
Edinburgh, UK: Oliver and Boyd.</cite>

<cite>Flore, P. C. and J. M. Wicherts
(2015).
&ldquo;Does Stereotype Threat Influence Performance of Girls in Stereotyped Domains? A Meta-Analysis&rdquo;.
In: Journal of School Psychology 53.1, pp. 25-44.
ISSN: 0022-4405.
DOI: <a href="https://doi.org/10.1016/j.jsp.2014.10.002">10.1016/j.jsp.2014.10.002</a>.</cite>

---
## More refs 4.

<cite>Galbraith, R. F.
(1994).
&ldquo;Some Applications of Radial Plots&rdquo;.
In: Journal of the American Statistical Association 89.428, pp. 1232-1242.
ISSN: 0162-1459.
DOI: <a href="https://doi.org/10.1080/01621459.1994.10476864">10.1080/01621459.1994.10476864</a>.</cite>

<cite>Glass, G. V.
(1976).
&ldquo;Primary, Secondary, and Meta-Analysis of Research&rdquo;.
In: Educational researcher 5.10, pp. 3-8.
ISSN: 0013-189X.
DOI: <a href="https://doi.org/10.3102/0013189X005010003">10.3102/0013189X005010003</a>.</cite>

<cite>Goh, J. X, J. A. Hall, and R. Rosenthal
(2016).
&ldquo;Mini Meta-Analysis of Your Own Studies: Some Arguments on Why and a Primer on How&rdquo;.
In: Social and Personality Psychology Compass 10.10, pp. 535-549.
ISSN: 1751-9004.
DOI: <a href="https://doi.org/10.1111/spc3.12267">10.1111/spc3.12267</a>.</cite>

<cite>Harrell, F. E.
(2015).
Regression Modeling Strategies.
2nd.
Springer Series in Statistics.
New York, NY: Springer New York.
ISBN: 978-1-4419-2918-1.
DOI: <a href="https://doi.org/10.1007/978-1-4757-3462-1">10.1007/978-1-4757-3462-1</a>.</cite>

<cite>Harrer, M., P. Cuijpers, and D. D. Ebert
(2019).
Doing Meta-Analysis in R: A Hands-on Guide.
https://bookdown.org/MathiasHarrer/Doing\_ Meta\_ Analysis\_ in\_ R/.</cite>

---
## More refs 5.

<cite>Hartung, J. and G. Knapp
(2001).
&ldquo;On Tests of the Overall Treatment Effect in Meta-Analysis with Normally Distributed Responses&rdquo;.
In: Statistics in Medicine 20.12, pp. 1771-1782.
DOI: <a href="https://doi.org/10.1002/sim.791">10.1002/sim.791</a>.</cite>

<cite>Hayes, A. F. and K. Krippendorff
(2007).
&ldquo;Answering the Call for a Standard Reliability Measure for Coding Data&rdquo;.
In: Communication Methods and Measures 1.1, pp. 77-89.
ISSN: 1931-2458.
DOI: <a href="https://doi.org/10.1080/19312450709336664">10.1080/19312450709336664</a>.</cite>

<cite>Hedges, L. V.
(1981).
&ldquo;Distribution Theory for Glass's Estimator of Effect Size and Related Estimators&rdquo;.
In: Journal of Educational Statistics 6.2, pp. 107-128.
DOI: <a href="https://doi.org/10.3102/10769986006002107">10.3102/10769986006002107</a>.</cite>

<cite>Hedges, L. V.
(1984).
&ldquo;Estimation of Effect Size under Nonrandom Sampling: The Effects of Censoring Studies Yielding Statistically Insignificant Mean Differences&rdquo;.
In: Journal of Educational Statistics 9.1, pp. 61-85.
ISSN: 0362-9791.
DOI: <a href="https://doi.org/10.3102/10769986009001061">10.3102/10769986009001061</a>.</cite>

<cite>Hedges, L. V. and I. Olkin
(1980).
&ldquo;Vote-Counting Methods in Research Synthesis.&rdquo;
In: Psychological bulletin 88.2, pp. 359-369.
ISSN: 1939-1455.
DOI: <a href="https://doi.org/10.1037/0033-2909.88.2.359">10.1037/0033-2909.88.2.359</a>.</cite>

---
## More refs 6.

<cite>Higgins, J. P. T. and S. G. Thompson
(2002).
&ldquo;Quantifying Heterogeneity in a Meta-Analysis&rdquo;.
In: Statistics in Medicine 21.11, pp. 1539-1558.
DOI: <a href="https://doi.org/10.1002/sim.1186">10.1002/sim.1186</a>.</cite>

<cite>Higgins, J. P. T, S. G. Thompson, J. J. Deeks, et al.
(2003).
&ldquo;Measuring Inconsistency in Meta-Analyses&rdquo;.
In: BMJ 327.7414, pp. 557-560.
ISSN: 0959-8138.
DOI: <a href="https://doi.org/10.1136/bmj.327.7414.557">10.1136/bmj.327.7414.557</a>.</cite>

<cite>Higgins, J, S. Thompson, J. Deeks, et al.
(2002).
&ldquo;Statistical Heterogeneity in Systematic Reviews of Clinical Trials: A Critical Appraisal of Guidelines and Practice&rdquo;.
In: Journal of Health Services Research &amp; Policy 7.1, pp. 51-61.
DOI: <a href="https://doi.org/10.1258/1355819021927674">10.1258/1355819021927674</a>.</cite>

<cite>Hirschenhauser, K. and R. F. Oliveira
(2006).
&ldquo;Social Modulation of Androgens in Male Vertebrates: Meta-Analyses of the Challenge Hypothesis&rdquo;.
In: Animal Behaviour 71.2, pp. 265-277.
ISSN: 0003-3472.
DOI: <a href="https://doi.org/10.1016/j.anbehav.2005.04.014">10.1016/j.anbehav.2005.04.014</a>.</cite>

<cite>Ioannidis, J. P.
(2008).
&ldquo;Why Most Discovered True Associations Are Inflated&rdquo;.
In: Epidemiology 19.5, pp. 640-648.
ISSN: 1044-3983.</cite>
---
## More refs 7.

<cite>Jackson, D, M. Law, G. Rücker, et al.
(2017).
&ldquo;The Hartung-Knapp Modification for Random-Effects Meta-Analysis: A Useful Refinement but Are There Any Residual Concerns?&rdquo;
In: Statistics in Medicine 36.25, pp. 3923-3934.
DOI: <a href="https://doi.org/10.1002/sim.7411">10.1002/sim.7411</a>.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7411.</cite>

<cite>Jacobs, P. and W. Viechtbauer
(2016).
&ldquo;Estimation of the Biserial Correlation and Its Sampling Variance for Use in Meta-Analysis&rdquo;.
In: Research Synthesis Methods 8.2, pp. 161-180.
DOI: <a href="https://doi.org/10.1002/jrsm.1218">10.1002/jrsm.1218</a>.</cite>

<cite>Koricheva, J, J. Gurevitch, and K. Mengersen
(2013).
Handbook of Meta-Analysis in Ecology and Evolution.
Princeton, NJ: Princeton University Press.
ISBN: 0-691-13729-3.</cite>

<cite>Kovalchik, S.
(2013).
Tutorial On Meta-Analysis In R - R useR! Conference 2013.</cite>

<cite>Lipsey, M. W. and D. B. Wilson
(2001).
Practical Meta-Analysis.
London: SAGE publications, Inc.
ISBN: 0-7619-2167-2.</cite>

---
## More refs 8.

<cite>Littell, J. H, J. Corcoran, and V. Pillai
(2008).
Systematic Reviews and Meta-Analysis.
Oxford, UK: Oxford University Press.
ISBN: 0-19-532654-7.</cite>

<cite>McShane, B. B, U. Böckenholt, and K. T. Hansen
(2016).
&ldquo;Adjusting for Publication Bias in Meta-Analysis: An Evaluation of Selection Methods and Some Cautionary Notes&rdquo;.
In: Perspectives on Psychological Science 11.5, pp. 730-749.
DOI: <a href="https://doi.org/10.1177/1745691616662243">10.1177/1745691616662243</a>.
eprint: https://doi.org/10.1177/1745691616662243.</cite>

<cite>Mengersen, K, C. Schmidt, M. Jennions, et al.
(2013).
&ldquo;Statistical Models and Approaches to Inference&rdquo;.
In: 
Handbook of Meta-Analysis in Ecology and Evolution.
Ed. by Koricheva, J, J. Gurevitch and Mengersen, Kerrie.
Princeton, NJ: Princeton University Press, pp. 89-107.</cite>

<cite>Methley, A. M, S. Campbell, C. Chew-Graham, et al.
(2014).
&ldquo;PICO, PICOS and SPIDER: A Comparison Study of Specificity and Sensitivity in Three Search Tools for Qualitative Systematic Reviews&rdquo;.
Eng.
In: BMC health services research 14, pp. 579-579.
ISSN: 1472-6963.
DOI: <a href="https://doi.org/10.1186/s12913-014-0579-0">10.1186/s12913-014-0579-0</a>.</cite>

<cite>Morina, N, K. Stam, T. V. Pollet, et al.
(2018).
&ldquo;Prevalence of Depression and Posttraumatic Stress Disorder in Adult Civilian Survivors of War Who Stay in War-Afflicted Regions. A Systematic Review and Meta-Analysis of Epidemiological Studies&rdquo;.
In: Journal of Affective Disorders 239, pp. 328-338.
ISSN: 0165-0327.
DOI: <a href="https://doi.org/10.1016/j.jad.2018.07.027">10.1016/j.jad.2018.07.027</a>.</cite>

---
## More refs 9.

<cite>Nakagawa, S, D. W. A. Noble, A. M. Senior, et al.
(2017).
&ldquo;Meta-Evaluation of Meta-Analysis: Ten Appraisal Questions for Biologists&rdquo;.
In: BMC Biology 15.1, p. 18.
ISSN: 1741-7007.
DOI: <a href="https://doi.org/10.1186/s12915-017-0357-7">10.1186/s12915-017-0357-7</a>.</cite>

<cite>Pastor, D. A. and R. A. Lazowski
(2018).
&ldquo;On the Multilevel Nature of Meta-Analysis: A Tutorial, Comparison of Software Programs, and Discussion of Analytic Choices&rdquo;.
In: Multivariate Behavioral Research 53.1, pp. 74-89.
DOI: <a href="https://doi.org/10.1080/00273171.2017.1365684">10.1080/00273171.2017.1365684</a>.</cite>

<cite>Poole, C. and S. Greenland
(1999).
&ldquo;Random-Effects Meta-Analyses Are Not Always Conservative&rdquo;.
In: American Journal of Epidemiology 150.5, pp. 469-475.
ISSN: 0002-9262.
DOI: <a href="https://doi.org/10.1093/oxfordjournals.aje.a010035">10.1093/oxfordjournals.aje.a010035</a>.
eprint: http://oup.prod.sis.lan/aje/article-pdf/150/5/469/286690/150-5-469.pdf.</cite>

<cite>Popper, K.
(1959).
The Logic of Scientific Discovery.
London, UK: Hutchinson.
ISBN: 1-134-47002-9.</cite>

<cite>Roberts, P. D, G. B. Stewart, and A. S. Pullin
(2006).
&ldquo;Are Review Articles a Reliable Source of Evidence to Support Conservation and Environmental Management? A Comparison with Medicine&rdquo;.
In: Biological conservation 132.4, pp. 409-423.
ISSN: 0006-3207.</cite>

---
## More refs 10.

<cite>Rosenberg, M. S, H. R. Rothstein, and J. Gurevitch
(2013).
&ldquo;Effect Sizes: Conventional Choices and Calculations&rdquo;.
In: Handbook of Meta-analysis in Ecology and Evolution, pp. 61-71.</cite>

<cite>Röver, C, G. Knapp, and T. Friede
(2015).
&ldquo;Hartung-Knapp-Sidik-Jonkman Approach and Its Modification for Random-Effects Meta-Analysis with Few Studies&rdquo;.
In: BMC Medical Research Methodology 15.1, p. 99.
ISSN: 1471-2288.
DOI: <a href="https://doi.org/10.1186/s12874-015-0091-1">10.1186/s12874-015-0091-1</a>.</cite>

<cite>Schwarzer, G, J. R. Carpenter, and G. Rücker
(2015).
Meta-Analysis with R.
New York, NY: Springer.
ISBN: 3-319-21415-2.</cite>

<cite>Schwarzer, G, H. Chemaitelly, L. J. Abu-Raddad, et al.
&ldquo;Seriously Misleading Results Using Inverse of Freeman-Tukey Double Arcsine Transformation in Meta-Analysis of Single Proportions&rdquo;.
In: Research Synthesis Methods 0.0.
DOI: <a href="https://doi.org/10.1002/jrsm.1348">10.1002/jrsm.1348</a>.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1348.</cite>

<cite>Simmons, J. P, L. D. Nelson, and U. Simonsohn
(2011).
&ldquo;False-Positive Psychology&rdquo;.
In: Psychological Science 22.11, pp. 1359-1366.
ISSN: 0956-7976.
DOI: <a href="https://doi.org/10.1177/0956797611417632">10.1177/0956797611417632</a>.</cite>

---
## More refs 11.

<cite>Simonsohn, U, L. D. Nelson, and J. P. Simmons
(2014).
&ldquo;P-Curve: A Key to the File-Drawer.&rdquo;
In: Journal of Experimental Psychology: General 143.2, pp. 534-547.
ISSN: 1939-2222.
DOI: <a href="https://doi.org/10.1037/a0033242">10.1037/a0033242</a>.</cite>

<cite>Sterne, J. A. C, A. J. Sutton, J. P. A. Ioannidis, et al.
(2011).
&ldquo;Recommendations for Examining and Interpreting Funnel Plot Asymmetry in Meta-Analyses of Randomised Controlled Trials&rdquo;.
In: BMJ 343.jul22 1, pp. d4002-d4002.
ISSN: 0959-8138.
DOI: <a href="https://doi.org/10.1136/bmj.d4002">10.1136/bmj.d4002</a>.</cite>

<cite>Veroniki, A. A, D. Jackson, W. Viechtbauer, et al.
(2016).
&ldquo;Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis&rdquo;.
Eng.
In: Research synthesis methods 7.1, pp. 55-79.
ISSN: 1759-2887.
DOI: <a href="https://doi.org/10.1002/jrsm.1164">10.1002/jrsm.1164</a>.</cite>

<cite>Viechtbauer, W.
(2015).
&ldquo;Package &lsquo;metafor&rsquo;: Meta-Analysis Package for R&rdquo;.
</cite>

<cite>Weiss, B. and J. Daikeler
(2017).
Syllabus for Course: &ldquo;Meta-Analysis in Survey Methodology&quot;, 6th Summer Workshop (GESIS).</cite>

---
## More refs 12.

<cite>Wickham, H. and G. Grolemund
(2016).
R for Data Science.
Sebastopol, CA: O'Reilly..</cite>

<cite>Wiernik, B.
(2015).
A Brief Introduction to Meta-Analysis.</cite>

<cite>Wiksten, A, G. Rücker, and G. Schwarzer
(2016).
&ldquo;Hartung-Knapp Method Is Not Always Conservative Compared with Fixed-Effect Meta-Analysis&rdquo;.
In: Statistics in Medicine 35.15, pp. 2503-2515.
DOI: <a href="https://doi.org/10.1002/sim.6879">10.1002/sim.6879</a>.</cite>

<cite>Wingfield, J. C, R. E. Hegner, A. M. Dufty Jr, et al.
(1990).
&ldquo;The&quot; Challenge Hypothesis&quot;: Theoretical Implications for Patterns of Testosterone Secretion, Mating Systems, and Breeding Strategies&rdquo;.
In: American Naturalist 136, pp. 829-846.
ISSN: 0003-0147.</cite>

<cite>Yeaton, W. H. and P. M. Wortman
(1993).
&ldquo;On the Reliability of Meta-Analytic Reviews: The Role of Intercoder Agreement&rdquo;.
In: Evaluation Review 17.3, pp. 292-309.
ISSN: 0193-841X.
DOI: <a href="https://doi.org/10.1177/0193841X9301700303">10.1177/0193841X9301700303</a>.</cite>

---
## More refs 13.

---
## More refs 14.