Meta-analysis course: part 5: Subgroup analysis & Meta-regression

class: center, middle, inverse, title-slide

# Meta-analysis course: part 5: Subgroup analysis & Meta-regression
### Thomas Pollet (@tvpollet), Northumbria University
### 2019-10-02 | <a href="http://tvpollet.github.io/disclaimer">disclaimer</a>

---

## Where are we at?

* Principles of systematic reviews / meta-analysis

* Effect sizes

* Fixed vs. random effects meta-analyses

* Publication bias

* **Moderators and metaregression**

* Advanced 'stuff'.

---
## This section.

Remember heterogeneity?

Many causes could exist: We could try and explain this!

--> Search for potential moderators.

---
## When to look for moderators

- If you run a meta-analysis and there is substantial heterogeneity left (between-study variance)

- Several potential explanation for remaining heterogeneity:

* Sampling error
  * Uncorrected artifacts (e.g., issues with measurement)
  * Moderators

--> if your number of studies is relatively small then likely just sampling error. Better to understand how wide the uncertainty is.

---
## Moderators: a priori vs. post-hoc

- Consider any relevant moderators _before_ doing the analyses:
  * Theory
  * Methodology

- Avoid looking for moderators post-hoc --> likely fishing expedition.

---
## Testing moderators.

- Compared to original research studies, meta-analysis can have much greater statistical power to uncover moderators.

- Each primary study (of e.g, N=40) might not have sufficient subgroup members to examine an effect. But pooled data (20 studies, totaling N=800) would allow us to detect it.

- However, even in meta-analysis there are considerable constraints on power to detect effects

---
## Statistical power to detect moderators.

- The effective sample size is the number of studies:
  * Not sensible to test a large number of moderators when one has only 20-30 studies!

- Potential influence of moderators to explain heterogeneity is limited by real variation in the effect --> So, if something is inherently 'noisy' (or poorly measured) then we don't have much chance of capturing some of that variance with a moderator.

---
## Two methods

- Subset analysis
- Meta-regression

---
## Subset analysis

- We can conduct a separate meta-analysis on each subset (e.g., men vs. women, student sample versus non-student sample).

- If the moderator matters, there will be observable differences. Differences between mean effect size across the subsets (do confidence intervals overlap)? . Shrinkage of variance in each subset.

- If we evaluate multiple moderators, we need to be really wary of correlated predictors.

---
## Three approaches to subgroup analysis.

* fixed effect models,
* random effects models with a common estimate for `$\tau^2$`.
* random effects models with separate estimates for between-study variance `$\tau^2$`
across the subgroups

---
## When to use a fixed effect?

* If you assume that **all studies in subgroup** stem from the same population, and all have **one shared true effect**, you may use the **fixed-effect-model**. However, unlikely this assumption is ever **true in psychological** and **medical research**, even when we divide our studies into subgroups.

* Therefore, we typically use a **random-effect-model** which assumes that the studies within a subgroup are drawn from a **universe** of populations following its own distribution,... .

---
## A fixed effects example: Subgroup 1.

Control consists of control, Waiting list (WLC), information only. Here we compare waiting list versus no intervention.

```r
model_subgroup1<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.random = FALSE,
 method.tau = "SJ",
 hakn = TRUE,
 prediction=TRUE,
 sm="SMD",
 subset = Control =='WLC')
```

---
## Subgroup 2.

```r
model_subgroup2<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.random = FALSE,
 method.tau = "SJ",
 hakn = TRUE,
 prediction=TRUE,
 sm="SMD",
 subset = Control =='information only')
```

---
## Subgroup 3

```r
model_subgroup3<-metagen(TE,
 seTE,
 data=madata,
 studlab=paste(Author),
 comb.random = FALSE,
 method.tau = "SJ",
 hakn = TRUE,
 prediction=TRUE,
 sm="SMD",
 subset = Control =='no intervention')
```

---
## Create vectors needed.

We would need vectors of the estimate of the treatment effects and corresponding errors.

```r
# Subgroup treatment effects (fixed effect model)
 TE.control <- c(model_subgroup1$TE.fixed, model_subgroup2$TE.fixed,model_subgroup3$TE.fixed)
# Corresponding standard errors (fixed effect model)
 seTE.control <- c(model_subgroup1$seTE.fixed, model_subgroup2$seTE.fixed, model_subgroup3$seTE.fixed)
```

---
## Fixed effect meta-analysis.

This uses the generic invariance method. This suggests there is no significant difference in ( `$Q=5.49$`, `$p=0.064$` ).

```r
model_control<-metagen(TE.control,
 seTE.control,
 comb.random = FALSE,
 sm="SMD")
sink("model_control.txt")
model_control
```

```
## SMD 95%-CI %W(fixed)
## 1 0.6688 [0.4839; 0.8537] 27.3
## 2 0.4016 [0.2048; 0.5983] 24.1
## 3 0.4141 [0.2757; 0.5525] 48.7
## 
## Number of studies combined: k = 3
## 
## SMD 95%-CI z p-value
## Fixed effect model 0.4805 [0.3840; 0.5771] 9.75 < 0.0001
## 
## Quantifying heterogeneity:
## tau^2 = 0.0134; H = 1.66 [1.00; 3.10]; I^2 = 63.5% [0.0%; 89.6%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 5.49 2 0.0644
## 
## Details on meta-analytical method:
## - Inverse variance method
```

```r
sink()
```

--
## Shorter route...

```r
model_control_u<-update.meta(model_hksj, 
 byvar=Control, 
 comb.random = FALSE, 
 comb.fixed = TRUE)
sink("model_control_u.txt")
model_control_u
```

```
## SMD 95%-CI %W(fixed) Control
## Call et al. 0.7091 [ 0.1979; 1.2203] 3.6 WLC
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.3 WLC
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 2.0 WLC
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 17.5 no intervention
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 11.6 information only
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.3 no intervention
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 4.8 no intervention
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 5.5 no intervention
## Hintz et al. 0.2840 [-0.0453; 0.6133] 8.6 information only
## Kang et al. 1.2751 [ 0.6142; 1.9360] 2.1 no intervention
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.4 no intervention
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 4.6 WLC
## Phang et al. 0.5407 [ 0.0619; 1.0196] 4.1 no intervention
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 3.6 WLC
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 2.0 no intervention
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 2.4 WLC
## SongLindquist 0.6126 [ 0.1683; 1.0569] 4.7 WLC
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 3.9 information only
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI z p-value
## Fixed effect model 0.4805 [ 0.3840; 0.5771] 9.75 < 0.0001
## Prediction interval [-0.2084; 1.3954] 
## 
## Quantifying heterogeneity:
## tau^2 = 0.1337; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Quantifying residual heterogeneity:
## H = 1.63 [1.25; 2.14]; I^2 = 62.5% [35.7%; 78.2%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Results for subgroups (fixed effect model):
## k SMD 95%-CI Q tau^2 I^2
## Control = WLC 7 0.6688 [0.4839; 0.8537] 22.17 0.2501 72.9%
## Control = no intervention 8 0.4141 [0.2757; 0.5525] 16.70 0.0789 58.1%
## Control = information only 3 0.4016 [0.2048; 0.5983] 1.14 0.0068 0.0%
## 
## Test for subgroup differences (fixed effect model):
## Q d.f. p-value
## Between groups 5.49 2 0.0644
## Within groups 40.02 15 0.0005
## 
## Details on meta-analytical method:
## - Inverse variance method
```

```r
sink()
```

---
## A random effects example: Region.

```r
region<-c("Netherlands","Netherlands","Netherlands","USA","USA","USA","USA","Argentina","Argentina","Argentina","Australia","Australia","Australia","China","China","China","China","China")
madata$region<-region
```

---
## Subgroup Analyses using the Random-Effects-Model: 'common'.

This model assumes a common `$\tau^2$` , we use our previous model and update it!

```r
region_subgroup_common<-update.meta(model_hksj, 
 byvar=region, 
 comb.random = TRUE, 
 comb.fixed = FALSE,
 tau.common=TRUE)
sink("region_subgroup_common.txt")
region_subgroup_common
```

```
## SMD 95%-CI %W(random) region
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.2 Netherlands
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.1 Netherlands
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 4.2 Netherlands
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 7.1 USA
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 6.8 USA
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.1 USA
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7 USA
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 5.9 Argentina
## Hintz et al. 0.2840 [-0.0453; 0.6133] 6.5 Argentina
## Kang et al. 1.2751 [ 0.6142; 1.9360] 4.3 Argentina
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.1 Australia
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.6 Australia
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.4 Australia
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.3 China
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 4.1 China
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.5 China
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.7 China
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.4 China
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI t p-value
## Random effects model 0.5935 [ 0.3891; 0.7979] 6.13 < 0.0001
## Prediction interval [-0.2084; 1.3954] 
## 
## Quantifying heterogeneity:
## tau^2 = 0.1337; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Quantifying residual heterogeneity:
## tau^2 = 0.1416; H = 2.00; I^2 = 75.1%
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Results for subgroups (random effects model):
## k SMD 95%-CI Q tau^2 I^2
## region = Netherlands 3 0.8631 [-0.9173; 2.6435] 13.06 0.1416 84.7%
## region = USA 4 0.4730 [ 0.0877; 0.8583] 6.87 0.1416 56.3%
## region = Argentina 3 0.6263 [-0.5807; 1.8333] 6.95 0.1416 71.2%
## region = Australia 3 0.3355 [-0.2205; 0.8914] 2.13 0.1416 6.1%
## region = China 5 0.7122 [ 0.2008; 1.2236] 7.81 0.1416 48.8%
## 
## Test for subgroup differences (random effects model):
## Q d.f. p-value
## Between groups 3.96 4 0.4121
## Within groups 36.83 13 0.0004
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2 (assuming common tau^2 in subgroups)
## - Hartung-Knapp adjustment for random effects model
```

```r
sink()
```

---
## Result.

**Pooled effect for each subgroup** (country). 
Under `Test for subgroup differences (random effects model)`: **test for subgroup differences using the random-effects-model**, which is **not significant** , `$Q=3.96$`, `$p=0.4121$`.

This means that we did not find differences in the overall effect between different regions.

---
## Separate heterogeneity estimates for country.

```r
region_subgroup_sep<-update.meta(model_hksj, 
 byvar=region, 
 comb.random = TRUE, 
 comb.fixed = FALSE,
 tau.common=FALSE)
sink("region_subgroup_sep.txt")
region_subgroup_sep
```

```
## SMD 95%-CI %W(random) region
## Call et al. 0.7091 [ 0.1979; 1.2203] 5.2 Netherlands
## Cavanagh et al. 0.3549 [-0.0300; 0.7397] 6.1 Netherlands
## DanitzOrsillo 1.7912 [ 1.1139; 2.4685] 4.2 Netherlands
## de Vibe et al. 0.1825 [-0.0484; 0.4133] 7.1 USA
## Frazier et al. 0.4219 [ 0.1380; 0.7057] 6.8 USA
## Frogeli et al. 0.6300 [ 0.2458; 1.0142] 6.1 USA
## Gallego et al. 0.7249 [ 0.2846; 1.1652] 5.7 USA
## Hazlett-Stevens & Oren 0.5287 [ 0.1162; 0.9412] 5.9 Argentina
## Hintz et al. 0.2840 [-0.0453; 0.6133] 6.5 Argentina
## Kang et al. 1.2751 [ 0.6142; 1.9360] 4.3 Argentina
## Kuhlmann et al. 0.1036 [-0.2781; 0.4853] 6.1 Australia
## Lever Taylor et al. 0.3884 [-0.0639; 0.8407] 5.6 Australia
## Phang et al. 0.5407 [ 0.0619; 1.0196] 5.4 Australia
## Rasanen et al. 0.4262 [-0.0794; 0.9317] 5.3 China
## Ratanasiripong 0.5154 [-0.1731; 1.2039] 4.1 China
## Shapiro et al. 1.4797 [ 0.8618; 2.0977] 4.5 China
## SongLindquist 0.6126 [ 0.1683; 1.0569] 5.7 China
## Warnecke et al. 0.6000 [ 0.1120; 1.0880] 5.4 China
## 
## Number of studies combined: k = 18
## 
## SMD 95%-CI t p-value
## Random effects model 0.5935 [ 0.3891; 0.7979] 6.13 < 0.0001
## Prediction interval [-0.2084; 1.3954] 
## 
## Quantifying heterogeneity:
## tau^2 = 0.1337; H = 1.64 [1.27; 2.11]; I^2 = 62.6% [37.9%; 77.5%]
## 
## Quantifying residual heterogeneity:
## H = 1.68 [1.27; 2.24]; I^2 = 64.7% [37.6%; 80.0%]
## 
## Test of heterogeneity:
## Q d.f. p-value
## 45.50 17 0.0002
## 
## Results for subgroups (random effects model):
## k SMD 95%-CI Q tau^2 I^2
## region = Netherlands 3 0.9142 [-0.9150; 2.7433] 13.06 0.4508 84.7%
## region = USA 4 0.4456 [ 0.0600; 0.8312] 6.87 0.0357 56.3%
## region = Argentina 3 0.6371 [-0.5837; 1.8580] 6.95 0.1826 71.2%
## region = Australia 3 0.3194 [-0.2427; 0.8815] 2.13 0.0204 6.1%
## region = China 5 0.7098 [ 0.2018; 1.2177] 7.81 0.1110 48.8%
## 
## Test for subgroup differences (random effects model):
## Q d.f. p-value
## Between groups 4.52 4 0.3405
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Sidik-Jonkman estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
```

```r
sink()
```

---
## Interpretation and note.

* This model now allows for each country to have its own `$\tau^2$` estimate. As before we find no evidence for subgroup differences, `$Q=4.52$`, `$p=0.3405$`.

* Ideally, you'd want lots of studies (!) as currently some of the groups have very few cases. As with regression (Harrell, 2015), we can question how good estimates are when you have <10 cases per predictor.

---
## Meta-regression.

There isn't much of a conceptual difference between a subset analysis and a meta-regression (dummy coded).

* A conventional regression, we estimate a parameter `$y$` using a covariate `$x_i$` with `$n$`  regression coefficients `$b$` and `$a$` as intercept. Equation:
`$$y=b_0 + b_1x_1 + ...+b_nx_n + a$$`

In a meta-regression, we want to estimate the **effect size** `$\theta$` for different values of the covariate(s), so our regression looks like this:
`$$\hat \theta_k = \theta + b_1x_{1k} + ... + b_nx_{nk} + \epsilon_k + \zeta_k$$`
---
## Meta-regression: two extra terms.

Two **extra terms in the equation**: `$\epsilon_k$` and `$\zeta_k$`.  Think of these as 2 types of **independent errors** which cause our regression prediction to be **imperfect**.

1. `$\epsilon_k$`, is the sampling error through which the effect size of the study deviates from its "true" effect. It is assumed to follow a normal distribution.

2. `$\zeta_k$`, denotes that even the true effect size of the study is only sampled from **an overarching distribution of effect sizes** (think back to section on fixed vs. random effect). In a **fixed-effect-model**, we assume that all studies actually share the **same true effect size** and that the **between-study heterogeneity** `$\tau^2 = 0$`. In this case, we do not consider `$\zeta_k$` in our equation, but only `$\epsilon_k$`.

???
Note I use b's here rather than `$\beta$`'s in order to avoid confusion based on standardisation. Also note `$\zeta_k$` captures true variation. It is the difference between estimating the effect size **distribution**

---
## Terminology (Harrer, 2019 on meta-regression).

* Equation includes **fixed effects** (the `$\beta$` coefficients) as well as **random effects** ( `$\zeta_k$` ), the model used in meta-regression is often called **a mixed-effects-model**.

* **Subgroup analyses with more than two subgroups** are nothing else than a **meta-regression** with a **categorical predictor**. For meta-regression, these subgroups are then **dummy-coded**. An effect is a shift up or down... .

???
Note that on the graph it says `$\beta$` rather than b's.

---
## Assessing regression models: significance.

To evaluate the **statistical significance of a predictor**, we a **t-test** of its `$b$`-weight is performed, as in OLS regression.
$$ t=\frac{b}{SE_{b}}$$

This gives us a `$p$`-value telling us if a variable significantly predicts effect size differences in our regression model.

<div class="figure" style="text-align: center">
<img src="p_values_2x.png" alt="https://xkcd.com/1478/" width="175px" />
https://xkcd.com/1478/
</div>

---
## Assessing regression models: variance.

Our aim is to find a model **which explains as much as possible of the current variability in effect sizes** we find in our data.

In conventional regression, `$R^2$` is commonly used to quantify the **goodness of fit** of our model in percent (0-100%). As this measure is commonly used, and many researchers know how to to interpret it, we can also calculate a `$R^2$` analog for meta-regression using this formula:
`$$R^2=\frac{\hat\tau^2_{Random}-\hat\tau^2_{Regress}}{\hat\tau^2_{Random}}$$`
Where `$\hat\tau^2_{Random}$` is the estimated total heterogeneity based on the random-effects-model and `$\hat\tau^2_{Regress}$` the total heterogeneity of our mixed-effects regression model.

---
## Metaregression (categorical)

```r
sink("metareg_control.txt")
metareg(model_hksj,Control)
```

```
## 
## Mixed-Effects Model (k = 18; tau^2 estimator: SJ)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.1343 (SE = 0.0536)
## tau (square root of estimated tau^2 value):             0.3665
## I^2 (residual heterogeneity / unaccounted variability): 73.92%
## H^2 (unaccounted variability / sampling variability):   3.84
## R^2 (amount of heterogeneity accounted for):            0.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 15) = 40.0161, p-val = 0.0005
## 
## Test of Moderators (coefficients 2:3):
## F(df1 = 2, df2 = 15) = 0.9467, p-val = 0.4100
## 
## Model Results:
## 
##                         estimate      se    tval    pval    ci.lb   ci.ub 
## intrcpt                   0.4252  0.2250  1.8899  0.0782  -0.0543  0.9048 
## Controlno intervention    0.1003  0.2678  0.3744  0.7134  -0.4706  0.6711 
## ControlWLC                0.3380  0.2765  1.2224  0.2404  -0.2514  0.9274 
##  
## intrcpt                 . 
## Controlno intervention 
## ControlWLC 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
sink()
```

???
We see in the output that the `metareg` function uses the values of "Control" (i.e, the three different types of control groups) as a **moderator**. It takes **"information only"** as a dummy-coded *reference group*, and **"no intervention"** and **"WLC"** as dummy-coded **predictors**. If you wanted to swap those 
Under `Test of Moderators`, we can see that control groups are not significantly associated with effect size differences `$F_{2,15}=0.947$`, `$p=0.41$`. Our regression model does not explain any of the variability in our effect size data ($R^2=0\%$). 
Below `Model Results`, we can also see the `$b$`-values (`estimate`) of both predictors, and their significance level `pval`. As we can see, both predictors were not significant.

---
## Metaregression

Imagine that we wanted to test if publication year had affected the estimates of effect sizes.

```r
madata$pub_year<-c(2001,2002,2011,2013,2013,2014,1999,2018,2001,2002,2011,2013,2013,2014,1999,2018,2003,2005)
madata$pub_year<-as.numeric(madata$pub_year)
model_pub_year<-metagen(TE,seTE,studlab = Author,comb.fixed = FALSE,data=madata)
```

---
## Run the model.

```r
output_pub_year<-metareg(model_pub_year,pub_year)
sink("metareg_pub_year.txt")
output_pub_year
```

```
## 
## Mixed-Effects Model (k = 18; tau^2 estimator: DL)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.0831 (SE = 0.0488)
## tau (square root of estimated tau^2 value):             0.2883
## I^2 (residual heterogeneity / unaccounted variability): 64.69%
## H^2 (unaccounted variability / sampling variability):   2.83
## R^2 (amount of heterogeneity accounted for):            0.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 16) = 45.3076, p-val = 0.0001
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 0.0054, p-val = 0.9412
## 
## Model Results:
## 
##           estimate       se     zval    pval     ci.lb    ci.ub 
## intrcpt    -1.4580  27.6151  -0.0528  0.9579  -55.5825  52.6666    
## pub_year    0.0010   0.0137   0.0737  0.9412   -0.0259   0.0280    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
sink()
```

---
## Basic plotting.

```r
bubble.metareg(output_pub_year,
              xlab = "Publication Year",
              col.line = "blue",
              studlab = TRUE)
```

???
the size of the plotting symbol is inversely proportional to the variance of the estimated treatment effect (Thompson & Higgins, 2002). Thompson SG, Higgins JP (2002): How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine, 21, 1559--73

---
## Interactions,... .

Thus far, we only considered the case where we have multiple predictor variables `$x_1,x_2, ... x_n$`, and along with their predictor estimates `$b_n$`, **add them together** to calculate our estimate of the true effect size `$\hat \theta_k$` for each study `$k$`.

In multiple meta-regression models, however, we can not only model such **additive relationships**. We can also model so-called **interactions** (multiplicative relationships), as in OLS regression.

Interactions mean that the **relationship** between one **predictor variable** (e.g., `$x_1$`) and the **estimated effect size** is different for different values of another predictor variable (e.g. `$x_2$`).

---
## Interaction: example

Imagine a scenario where we want to model 2 predictors and their relationship to the estimated effect size `$\hat\theta$`: the **publication year** ( `$x_1$` ) of a study and the **quality** ( `$x_2$` ) of a study, which we rate like this:

`$$\begin{equation}
  x_2 = \left \{\begin{array}{ll}
      0: bad
      \\1: moderate
      \\2: good
    \end{array}
  \right.
\end{equation}$$`

As we described before, we can now imagine a meta-regression model in which we combine these two predictors `$x_1$` and `$x_2$` and assume an additive relationship. We can do this by simply adding them:
`$$\hat \theta_k = \theta + b_1x_{1k} + b_2x_{2k} + \epsilon_k + \zeta_k$$`

---
## Example: continued.

Let's assume that, overall, higher publication year ( `$x_1$` ) is associated with higher effect sizes (i.e., reported effect sizes have risen over the years). We could now ask ourselves if this positive relationship **varies** depending on the quality of the studies ( `$x_2$` ).

The rise in effect sizes could be strongest for high-quality studies, while effect sizes stayed mostly the same over the years for studies of lower quality?

Visualisation of assumed relationship between effect size ( `$\hat \theta_k$` ), publication year ( `$x_1$` ) and study quality ( `$x_2$` ) the following way (next slide.)

---
## Illustration.

<div class="figure" style="text-align: center">
<img src="Interaction_illustration.png" alt="illustration of interaction from Harrer" width="500px" />
illustration of interaction from Harrer
</div>

---
## Interaction equation.

**Interaction term** to our meta-regression model. This interaction term allows predictions of `$x_1$` to vary for different values of `$x_2$` (and vice versa). This means introducing a third predictor, `$b_3$`, capturing this interaction `$x_{1k}x_{2k}$` :
`$$\hat \theta_k = \theta + b_1x_{1k} + b_2x_{2k} + b_3x_{1k}x_{2k}+ \epsilon_k + \zeta_k$$`

Word of caution before we proceed... .

---
## Common pitfalls: Overfitting I

* **Overfitting: seeing a signal when there is none**
Statistical model which fits the data **too closely** (Harrell, 2015: 72-ff).

<div class="figure" style="text-align: center">
<img src="overfitting.png" alt="Illustration from Harrer" width="550px" />
Illustration from Harrer
</div>

???
To better understand the risks of (multiple) meta-regression models, we have to understand the concept of **overfitting**

---
## Common pitfalls: Overfitting II

Risk of building a **non-robust model, which produces false-positive results**, is **even higher** once we go from conventional regression to **meta-regression** .

Several reasons exist:

1.  Our **datasets are mostly small**, as we can only use the synthesized data of all analyzed studies `$k$`.
2.  Meta-analysis is **comprehensive overview of all evidence**, we have no **additional data** on which we can "test" how well our regression model can predict high or low effect sizes.
3.  Heterogeneity causes problems. Every variable might be a potential explanation for the heterogeneity we find, while it seems straightforward: **most of such explanations are spurious** (Higgins & Thompson,2004).
4.  Meta-regression makes it very easy to **"play around" with predictors**.  This massively **increases the risk of spurious findings** (Higgins & Thompson, 2004), because we can try several predictors indefinitely until we find a 'significant model', which is then very likely to be overfitted (i.e., it mostly models statistical noise).

???
We can test numerous meta-regression models, include many more predictors or remove them in an attempt to explain the heterogeneity in our data. Such an approach is of course tempting, and often found in practice, because we, as meta-analysts, want to find a significant model which explains why effect sizes differ (Higgins et al., 2002).
---
## Potential solutions.

**Some guidelines have been proposed to avoid an excessive false positive rate when building meta-regression models:**

- Minimize the number of investigated predictors. In multiple meta-regression, this translates to the concept of **parsimony**, or simplicity: when evaluating the fit of a meta-regression model, we prefer models which achieve a good fit with less predictors. Information criteria such as the *Akaike* and *Bayesian information criterion* can help with such decisions (more to follow).

- Predictor selection should be based on **predefined scientific or theoretical questions** we want to answer in our meta-regression.

- When the number of studies is low (which is very likely to be the case), and we want to compute the significance of a predictor, the Knapp-Hartung adjustment (continuous outcomes) is recommended to obtain more reliable estimates (Higgins et al., 2002, Journal of Health Services).

- We can use **permutation** to assess the robustness of our model in resampled data. --> Talk about this later.

---
## Common pitfalls: Multicollinearity

Multicollinearity means that one or more predictors in our regression model can be (linearly) **predicted from another predictor** in our model with relatively high accuracy (Harrell, 2015:78-ff). This basically means that we have two or more predictors in our model which are **highly correlated**. Most of the dangers of multicollinearity are associated with the problem of **overfitting** which we described above. High collinearity can cause our predictor coefficient estimates `$b$` to behave erratically, and change considerably with minor changes in our data. It also limits the size of the explained variance by the model, in our case the `$R^2$` analog.

**Multicollinearity in meta-regression is common**. Although multiple regression can handle lower degrees of collinearity, we should **check** and, if necessary, **control for very highly correlated predictors**.

**No simple yes-no-rule** for the presence of multicollinearity. A crude, but often effective way is to check for very high correlations (i.e., `$r\geq0.8$`) before fitting the model. Multicollinearity solution: (1) removing one of the close-to-redundant predictors, or (2) trying to combine the predictors into one single predictor.

---
## Example.

We will rely on the `metafor` package rather than the `meta` package.

```r
library(metafor)
library(tidyverse)
```

For our multiple meta-regression examples, we will use [Harrer (2019)](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/mvreg_data.rda) `mvreg.data` dataset, a "toy" dataset, simulated for illustrative purposes.

```r
load("mvreg_data.rda")
levels(mvreg.data$continent)[levels(mvreg.data$continent)==0] = "Europe"
levels(mvreg.data$continent)[levels(mvreg.data$continent)==1] = "North America"
mvreg.data$continent = as.character(mvreg.data$continent)
```

---
## Data

**Let's have a look at the structure of the data:**

```r
head(mvreg.data)
```

```
##           yi       sei reputation quality    pubyear     continent
## 1 0.09437543 0.1959031        -11       6 -0.8547536 North America
## 2 0.09981923 0.1918510          0       9 -0.7527718        Europe
## 3 0.16931607 0.1193179        -11       5 -0.6604835 North America
## 4 0.17511107 0.1161592          4       9 -0.5630484        Europe
## 5 0.27301641 0.1646946        -10       2 -0.4308793 North America
## 6 0.28594668 0.1704299         -9      10 -0.3582629        Europe
```

???
We see that there are 6 variables in our dataset. The `yi` and `sei` columns store the **effect size** and **standard error** of a particular study. Thus, these columns correspond to the `TE` and `seTE` columns we used before. We have named these variables this way because this is the standard notation that `metafor` uses: `yi` corresponds to the effect size `$y_i$` we want to predict in our meta-regression, while `sei` is `$SE_i$`, the standard error. To designate the variance of an effect size, `metafor` uses `vi`, or `$v_i$` in mathematical notation, which we do not need here because `yi` and `sei` contain all the information we need.

The other four variables we have in our dataset are potential predictors for our meta-regression. We want to check if `reputation`, the (mean-centered) impact score of the journal the study was published in, `quality`, the quality of the study rated from 0 to 10, `pubyear`, the (standardized) publication year, and `continent`, the continent in which the study was performed, are associated with different effect sizes.

For `continent`, note that we store information as a predictor with 2 labels: `Europe` and `North America`, meaning that this predictor is a **dummy variable**. Always remember that such dummy variables have to be converted from a `chr` to a factor vector before we can proceed.

---
## Collinearity check

Multicollinearity could be an issue. A quick way to check for high intercorrelation is to calculate a **intercorrelation matrix** for all continuous variables with the following code:

```r
cor_table<-cor(mvreg.data[,3:5])
cor_table
```

```
##            reputation    quality    pubyear
## reputation  1.0000000  0.3015694  0.3346594
## quality     0.3015694  1.0000000 -0.1551123
## pubyear     0.3346594 -0.1551123  1.0000000
```

---
## Correlation plot

The `ggcorrplot` package allows visualising the intercorrelations. Make sure to install the `ggcorrplot` package first, and then use this code:

```r
require(ggcorrplot)
ggcorrplot(cor_table, hc.order = TRUE, type = "lower",
   lab = TRUE) 
```

???
Some correlations but not very large.

---
## Fitting a meta-regression model without interaction terms

```r
model_qual <- rma(yi=yi, 
 sei=sei, 
 data=mvreg.data, 
 method = "ML", 
 mods = ~ quality, 
 test="knha")
sink("model_qual.txt")
model_qual
```

```
## 
## Mixed-Effects Model (k = 36; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity): 0.0667 (SE = 0.0275)
## tau (square root of estimated tau^2 value): 0.2583
## I^2 (residual heterogeneity / unaccounted variability): 60.04%
## H^2 (unaccounted variability / sampling variability): 2.50
## R^2 (amount of heterogeneity accounted for): 7.37%
## 
## Test for Residual Heterogeneity:
## QE(df = 34) = 88.6130, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 34) = 3.5330, p-val = 0.0688
## 
## Model Results:
## 
## estimate se tval pval ci.lb ci.ub 
## intrcpt 0.3429 0.1354 2.5318 0.0161 0.0677 0.6181 * 
## quality 0.0356 0.0189 1.8796 0.0688 -0.0029 0.0740 . 
## 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
sink()
```

???
We see that the `$p$` value for our predictor is non-significant `$p=0.0744$`, but only marginally so. Under `Test of Moderators (coefficient(s) 2)`, we can see the overall test results for our regression model ($F_{1,34}=3.68, p=0.0688$). Because we only included one predictor, the `$p$`-value reported there is identical to the one we saw before. In total, our model explains `$R^2=7.37\%$` of the heterogeneity in our data, which we can see next to the `R^2 (amount of heterogeneity accounted for)` line in our output.

---
## Adding reputation as a moderator.

```r
model_qual_rep <- rma(yi=yi, 
 sei=sei, 
 data=mvreg.data, 
 method = "ML", 
 mods =~ quality + reputation, 
 test="knha")
sink("model_qual_rep.txt")
model_qual_rep
```

```
## 
## Mixed-Effects Model (k = 36; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity): 0.0238 (SE = 0.0161)
## tau (square root of estimated tau^2 value): 0.1543
## I^2 (residual heterogeneity / unaccounted variability): 34.62%
## H^2 (unaccounted variability / sampling variability): 1.53
## R^2 (amount of heterogeneity accounted for): 66.95%
## 
## Test for Residual Heterogeneity:
## QE(df = 33) = 58.3042, p-val = 0.0042
## 
## Test of Moderators (coefficients 2:3):
## F(df1 = 2, df2 = 33) = 12.2476, p-val = 0.0001
## 
## Model Results:
## 
## estimate se tval pval ci.lb ci.ub 
## intrcpt 0.5005 0.1090 4.5927 <.0001 0.2788 0.7222 *** 
## quality 0.0110 0.0151 0.7312 0.4698 -0.0197 0.0417 
## reputation 0.0343 0.0075 4.5435 <.0001 0.0189 0.0496 *** 
## 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
sink()
```

---
## Comparing models via a Likelihood Ratio Test

```r
anova(model_qual, model_qual_rep)
```

```
## 
## df AIC BIC AICc logLik LRT pval QE tau^2 
## Full 4 19.4816 25.8157 20.7720 -5.7408 58.3042 0.0238 
## Reduced 3 36.9808 41.7314 37.7308 -15.4904 19.4992 <.0001 88.6130 0.0667 
## R^2 
## Full 
## Reduced 64.3197%
```

???
The test is highly significant ( `$\chi^2_1=19.50, p<0.001$` ), which means that that our full model indeed provides a better fit. Another important statistic is reported in the `AICc` column. This provides us with the *Akaike's Information Criterion*, corrected for small samples. As we mentioned before, AICc penalizes complex models with more predictors to avoid overfitting. It is important to note that **lower values of AIC(c) mean that a model performs better**. Interestingly, in our output, we see that the full model ( `$AICc=20.77$` ) has a better AIC value than our reduced model ( `$AICc=37.73$` ), despite having more predictors. All of this suggests that our multiple regression **does indeed provide a good fit** to our data.

---
## AIC(c) / BIC

Too much to cover here. But these are information criteria which allows comparing models (derived on Maximum Likelihood, ML).

Smaller AIC/BIC is better.

We should be careful with rules of thumb but these have been proposed (Burnham & Anderson, 2002, 2004):

_Models having `$\Delta \leq 2$` are on a par, those where `$4 \leq \Delta \leq 7$` the lowest one has moderate support over the other, and cases where `$\Delta \geq 10$` have strong support over the other._

AICc is a correction for small samples (Burnham & Anderson, 2002). AIC and BIC differ in how they value model complexity, with BIC favouring simple models more strongly.

---
## Modeling interaction terms

Model an **interaction hypothesis** with predictors `pubyear` (publication year) and `continent`. 
Examine the relationship between publication year and effect size differs for Europe and North America. To model this in our `rma` function, we have to **connect our predictors** with `*` in the `mods` parameter.

Here, we do not compare the models directly using the `anova` function, we specify the `$\tau^2$` estimator to be `"REML"` (restricted maximum likelihood) this time:

```r
interaction_model <- rma(yi=yi,
 sei=sei, 
 data=mvreg.data, 
 method = "REML", 
 mods =~ pubyear*continent, 
 test="knha")
sink("interaction_model.txt")
interaction_model
```

```
## 
## Mixed-Effects Model (k = 36; tau^2 estimator: REML)
## 
## tau^2 (estimated amount of residual heterogeneity): 0 (SE = 0.0098)
## tau (square root of estimated tau^2 value): 0
## I^2 (residual heterogeneity / unaccounted variability): 0.00%
## H^2 (unaccounted variability / sampling variability): 1.00
## R^2 (amount of heterogeneity accounted for): 100.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 32) = 24.8408, p-val = 0.8124
## 
## Test of Moderators (coefficients 2:4):
## F(df1 = 3, df2 = 32) = 28.7778, p-val < .0001
## 
## Model Results:
## 
## estimate se tval pval ci.lb 
## intrcpt 0.3892 0.0421 9.2472 <.0001 0.3035 
## pubyear 0.1683 0.0834 2.0184 0.0520 -0.0015 
## continentNorth America 0.3986 0.0658 6.0539 <.0001 0.2645 
## pubyear:continentNorth America 0.6323 0.1271 4.9754 <.0001 0.3734 
## ci.ub 
## intrcpt 0.4750 *** 
## pubyear 0.3380 . 
## continentNorth America 0.5327 *** 
## pubyear:continentNorth America 0.8911 *** 
## 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
sink()
```

???
Note that `metafor` automatically **includes not only the interaction term**, but also both `pubyear` and `contintent` as **"normal" lower-order predictors** (as one should do). Also note that, as `continent` is a factor, `rma` detected that this is a **dummy predictor**, and used our category `Europe` as the `$D=0$` dummy against which the `North America` category is compared. We see that our interaction term `pubyear:continentNorth America` has a positive coefficient ($b=0.6323$), and that it is highly significant ($p<0.0001$), meaning that assumed interaction effect might in fact exist: there is an increase in effect sizes in recent years, but it is stronger for studies conducted in North America. We also see that model we fitted explains `$R^2=100\%$` of our heterogeneity. This is because our data was simulated for illustrative purposes. In practice, you will hardly ever explain **all** of the heterogeneity in your data (in fact, one should rather be concerned if one finds such results in real-life data, as this might mean we have overfitted our model).

---
## Permutation test.

Permutation is a process where we **rearrange**, or **shuffle**, the order of our data. As an example, imagine we have a set `$S$` containing **3 numbers**: `$S=\{1,2,3 \}$`. One possible permutation of this set is `$(2,1,3)$`; another is `$(3,2,1)$`. Permuted results both contain **all 3 numbers from before**, but in a different order.

Permutation can also be used to perform **permutation tests**, which is a specific form of **resampling methods**. These can be used to validate the **robustness** of a statistical model by providing it with (slightly) different data sampled from the same data source or generative process (Good, 2013). It allows us ** to assess if the coefficients capture a true pattern underlying our data**, or if we falsely assumed patterns, when they are statistical noise.

Technical details (Good, 2013; Viechtbauer 2015): In brief, we **re-calculate** the **p-values** of our overall meta-regression model and its coefficients based on the test statistics obtained across all possible, or many randomly selected, permutations. **How often is the test statistic we obtain from in our permuted data equal or greater than our original test statistic?**.

---
## Permutation test via `metafor`

Default is 1,000 permutations. You can change this iter=10,000. If you have a lot of time, you can also ask for an exact permutation test.

```r
permutest(model_qual_rep, iter=10000, progbar=F) # hide progress.
```

???
We again see our **familiar output** including the **results for all predictors**. Looking at the `pval*` column, we see that our p-value for the `reputation` predictor has decreased from `$p<0.0001$` to `$p^*=0.001$`. The p-value of our overall model has decreased from `$p=0.0001$` to `$p^*=0.001$`. However, as both of these p-values are still highly significant, this indicates that our model might indeed capture a real pattern underlying our data. It has been **recommended** to always use this **permutation test on our meta-regression model before we report a meta-regression model** to be significant in our research [@higgins2004controlling].

---
## Output.

```r
permutest(model_qual_rep, iter=10000, progbar=F) # hide progress.
```

```
## 
## Test of Moderators (coefficients 2:3):
## F(df1 = 2, df2 = 33) = 12.2476, p-val* = 0.0001
## 
## Model Results:
## 
##             estimate      se    tval   pval*    ci.lb   ci.ub 
## intrcpt       0.5005  0.1090  4.5927  0.2057   0.2788  0.7222      
## quality       0.0110  0.0151  0.7312  0.4395  -0.0197  0.0417      
## reputation    0.0343  0.0075  4.5435  0.0001   0.0189  0.0496  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
## Permutation test:

**Common recommendation** to use this **permutation test on our meta-regression model before we report a meta-regression model**.

If the **number of studies** `$k$` included in our model is **small**, conventionally used thresholds for statistical significance (i.e., p < 0.05) **cannot be reached**. A permutation test using `permutest` can only reach statistical significance if `$k \geq 5$`. Details on the `permutest` function can be found [here](https://www.rdocumentation.org/packages/metafor/versions/2.1-0/topics/permutest). Note that you can for example also gain Confidence Intervals.

---
## Review: Pros/cons of meta-regression.

Pros:

- We don't have to dichotomize continuous moderators.
- We can test more than one predictor in a uniform framework (controlling for another, if we have a large number of studies but [beware](http://www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/))

Cons:

- Low statistical power (small samples and possibly reliability issues). It's regression... .
- Often more predictors than studies (small N, large p problem.)
- Overfitting... .

---
## Some recommendations...

- Hypothesize about a limited set of moderators _a priori_
- Analyse categorical moderators with subgroups (but beware of correlated moderators)
- Be very cautious with meta-regression.
- If evidence for a strong moderator effect, avoid interpreting the overall effect size found and remove emphasis from it in your write up.

???
Readers will tend to focus on the headline and ignore the effects of moderators, i.e. the effect being found.

---
## Exercise.

Please see the exercise posted [here](https://tvpollet.github.io/Meta-analysis_5/Exercise_5_questions.html)

---
## Any Questions?

[http://tvpollet.github.io](http://tvpollet.github.io)

Twitter: @tvpollet

---
## Acknowledgments

* Numerous students and colleagues. Any mistakes are my own.

* My colleagues who helped me with regards to meta-analysis Nexhmedin Morina, Stijn Peperkoorn, Gert Stulp, Mirre Simons, Johannes Honekopp.

* HBES for funding this. Those who have funded me (not these studies per se): [NWO](www.nwo.nl), [Templeton](www.templeton.org), [NIAS](http://nias.knaw.nl).

* You for listening!

---
## References and further reading (errors = blame RefManageR)

<cite>Aert, R. C. M. van, J. M. Wicherts, and M. A. L. M. van
Assen
(2016).
&ldquo;Conducting Meta-Analyses Based on p Values: Reservations and Recommendations for Applying p-Uniform and p-Curve&rdquo;.
In: Perspectives on Psychological Science 11.5, pp. 713-729.
DOI: <a href="https://doi.org/10.1177/1745691616650874">10.1177/1745691616650874</a>.
eprint: https://doi.org/10.1177/1745691616650874.</cite>

<cite>Aloe, A. M. and C. G. Thompson
(2013).
&ldquo;The Synthesis of Partial Effect Sizes&rdquo;.
In: Journal of the Society for Social Work and Research 4.4, pp. 390-405.
DOI: <a href="https://doi.org/10.5243/jsswr.2013.24">10.5243/jsswr.2013.24</a>.
eprint: https://doi.org/10.5243/jsswr.2013.24.</cite>

<cite>Assink, M. and C. J. Wibbelink
(2016).
&ldquo;Fitting Three-Level Meta-Analytic Models in R: A Step-by-Step Tutorial&rdquo;.
In: The Quantitative Methods for Psychology 12.3, pp. 154-174.
ISSN: 2292-1354.</cite>

<cite>Barendregt, J. J., S. A. Doi, Y. Y. Lee, et al.
(2013).
&ldquo;Meta-Analysis of Prevalence&rdquo;.
In: Journal of Epidemiology and Community Health 67.11, pp. 974-978.
ISSN: 0143-005X.
DOI: <a href="https://doi.org/10.1136/jech-2013-203104">10.1136/jech-2013-203104</a>.</cite>

<cite>Becker, B. J. and M. Wu
(2007).
&ldquo;The Synthesis of Regression Slopes in Meta-Analysis&rdquo;.
In: Statistical science 22.3, pp. 414-429.
ISSN: 0883-4237.</cite>
---
## More refs 1.

<cite>Borenstein, M., L. V. Hedges, J. P. Higgins, et al.
(2009).
Introduction to Meta-Analysis.
John Wiley &amp; Sons.
ISBN: 1-119-96437-7.</cite>

<cite>Burnham, K. P. and D. R. Anderson
(2002).
Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach.
New York, NY: Springer.
ISBN: 0-387-95364-7.</cite>

<cite>Burnham, K. P. and D. R. Anderson
(2004).
&ldquo;Multimodel Inference: Understanding AIC and BIC in Model Selection&rdquo;.
In: Sociological Methods &amp; Research 33.2, pp. 261-304.
ISSN: 0049-1241.
DOI: <a href="https://doi.org/10.1177/0049124104268644">10.1177/0049124104268644</a>.</cite>

<cite>Carter, E. C., F. D. Schönbrodt, W. M. Gervais, et al.
(2019).
&ldquo;Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods&rdquo;.
In: Advances in Methods and Practices in Psychological Science 2.2, pp. 115-144.
DOI: <a href="https://doi.org/10.1177/2515245919847196">10.1177/2515245919847196</a>.</cite>

<cite>Chen, D. D. and K. E. Peace
(2013).
Applied Meta-Analysis with R.
Chapman and Hall/CRC.
ISBN: 1-4665-0600-8.</cite>

---
## More refs 2.

<cite>Cheung, M. W.
(2015a).
&ldquo;metaSEM: An R Package for Meta-Analysis Using Structural Equation Modeling&rdquo;.
In: Frontiers in Psychology 5, p. 1521.
ISSN: 1664-1078.
DOI: <a href="https://doi.org/10.3389/fpsyg.2014.01521">10.3389/fpsyg.2014.01521</a>.</cite>

<cite>Cheung, M. W.
(2015b).
Meta-Analysis: A Structural Equation Modeling Approach.
New York, NY: John Wiley &amp; Sons.
ISBN: 1-119-99343-1.</cite>

<cite>Cooper, H.
(2010).
Research Synthesis and Meta-Analysis: A Step-by-Step Approach.
4th.
Sage publications.
ISBN: 1-4833-4704-4.</cite>

<cite>Cooper, H., L. V. Hedges, and J. C. Valentine
(2009).
The Handbook of Research Synthesis and Meta-Analysis.
New York: Russell Sage Foundation.
ISBN: 1-61044-138-9.</cite>

<cite>Cooper, H. and E. A. Patall
(2009).
&ldquo;The Relative Benefits of Meta-Analysis Conducted with Individual Participant Data versus Aggregated Data.&rdquo;
In: Psychological Methods 14.2, pp. 165-176.
ISSN: 1433806886.
DOI: <a href="https://doi.org/10.1037/a0015565">10.1037/a0015565</a>.</cite>

---
## More refs 3.

<cite>Crawley, M. J.
(2013).
The R Book: Second Edition.
New York, NY: John Wiley &amp; Sons.
ISBN: 1-118-44896-0.</cite>

<cite>Cumming, G.
(2014).
&ldquo;The New Statistics&rdquo;.
In: Psychological Science 25.1, pp. 7-29.
ISSN: 0956-7976.
DOI: <a href="https://doi.org/10.1177/0956797613504966">10.1177/0956797613504966</a>.</cite>

<cite>Dickersin, K.
(2005).
&ldquo;Publication Bias: Recognizing the Problem, Understanding Its Origins and Scope, and Preventing Harm&rdquo;.
In: 
Publication Bias in Meta-Analysis Prevention, Assessment and Adjustments.
Ed. by H. R. Rothstein, A. J. Sutton and M. Borenstein.
Chichester, UK: John Wiley.</cite>

<cite>Fisher, R. A.
(1946).
Statistical Methods for Research Workers.
10th ed.
Edinburgh, UK: Oliver and Boyd.</cite>

<cite>Flore, P. C. and J. M. Wicherts
(2015).
&ldquo;Does Stereotype Threat Influence Performance of Girls in Stereotyped Domains? A Meta-Analysis&rdquo;.
In: Journal of School Psychology 53.1, pp. 25-44.
ISSN: 0022-4405.
DOI: <a href="https://doi.org/10.1016/j.jsp.2014.10.002">10.1016/j.jsp.2014.10.002</a>.</cite>

---
## More refs 4.

<cite>Galbraith, R. F.
(1994).
&ldquo;Some Applications of Radial Plots&rdquo;.
In: Journal of the American Statistical Association 89.428, pp. 1232-1242.
ISSN: 0162-1459.
DOI: <a href="https://doi.org/10.1080/01621459.1994.10476864">10.1080/01621459.1994.10476864</a>.</cite>

<cite>Glass, G. V.
(1976).
&ldquo;Primary, Secondary, and Meta-Analysis of Research&rdquo;.
In: Educational researcher 5.10, pp. 3-8.
ISSN: 0013-189X.
DOI: <a href="https://doi.org/10.3102/0013189X005010003">10.3102/0013189X005010003</a>.</cite>

<cite>Goh, J. X., J. A. Hall, and R. Rosenthal
(2016).
&ldquo;Mini Meta-Analysis of Your Own Studies: Some Arguments on Why and a Primer on How&rdquo;.
In: Social and Personality Psychology Compass 10.10, pp. 535-549.
ISSN: 1751-9004.
DOI: <a href="https://doi.org/10.1111/spc3.12267">10.1111/spc3.12267</a>.</cite>

<cite>Harrell, F. E.
(2015).
Regression Modeling Strategies.
2nd.
Springer Series in Statistics.
New York, NY: Springer New York.
ISBN: 978-1-4419-2918-1.
DOI: <a href="https://doi.org/10.1007/978-1-4757-3462-1">10.1007/978-1-4757-3462-1</a>.</cite>

<cite>Harrer, M., P. Cuijpers, and D. D. Ebert
(2019).
Doing Meta-Analysis in R: A Hands-on Guide.
https://bookdown.org/MathiasHarrer/Doing\_ Meta\_ Analysis\_ in\_ R/.</cite>

---
## More refs 5.

<cite>Hartung, J. and G. Knapp
(2001).
&ldquo;On Tests of the Overall Treatment Effect in Meta-Analysis with Normally Distributed Responses&rdquo;.
In: Statistics in Medicine 20.12, pp. 1771-1782.
DOI: <a href="https://doi.org/10.1002/sim.791">10.1002/sim.791</a>.</cite>

<cite>Hayes, A. F. and K. Krippendorff
(2007).
&ldquo;Answering the Call for a Standard Reliability Measure for Coding Data&rdquo;.
In: Communication Methods and Measures 1.1, pp. 77-89.
ISSN: 1931-2458.
DOI: <a href="https://doi.org/10.1080/19312450709336664">10.1080/19312450709336664</a>.</cite>

<cite>Hedges, L. V.
(1981).
&ldquo;Distribution Theory for Glass's Estimator of Effect Size and Related Estimators&rdquo;.
In: Journal of Educational Statistics 6.2, pp. 107-128.
DOI: <a href="https://doi.org/10.3102/10769986006002107">10.3102/10769986006002107</a>.</cite>

<cite>Hedges, L. V.
(1984).
&ldquo;Estimation of Effect Size under Nonrandom Sampling: The Effects of Censoring Studies Yielding Statistically Insignificant Mean Differences&rdquo;.
In: Journal of Educational Statistics 9.1, pp. 61-85.
ISSN: 0362-9791.
DOI: <a href="https://doi.org/10.3102/10769986009001061">10.3102/10769986009001061</a>.</cite>

<cite>Hedges, L. V. and I. Olkin
(1980).
&ldquo;Vote-Counting Methods in Research Synthesis.&rdquo;
In: Psychological bulletin 88.2, pp. 359-369.
ISSN: 1939-1455.
DOI: <a href="https://doi.org/10.1037/0033-2909.88.2.359">10.1037/0033-2909.88.2.359</a>.</cite>

---
## More refs 6.

<cite>Higgins, J. P. T. and S. G. Thompson
(2002).
&ldquo;Quantifying Heterogeneity in a Meta-Analysis&rdquo;.
In: Statistics in Medicine 21.11, pp. 1539-1558.
DOI: <a href="https://doi.org/10.1002/sim.1186">10.1002/sim.1186</a>.</cite>

<cite>Higgins, J. P. T., S. G. Thompson, J. J. Deeks, et al.
(2003).
&ldquo;Measuring Inconsistency in Meta-Analyses&rdquo;.
In: BMJ 327.7414, pp. 557-560.
ISSN: 0959-8138.
DOI: <a href="https://doi.org/10.1136/bmj.327.7414.557">10.1136/bmj.327.7414.557</a>.</cite>

<cite>Higgins, J., S. Thompson, J. Deeks, et al.
(2002).
&ldquo;Statistical Heterogeneity in Systematic Reviews of Clinical Trials: A Critical Appraisal of Guidelines and Practice&rdquo;.
In: Journal of Health Services Research &amp; Policy 7.1, pp. 51-61.
DOI: <a href="https://doi.org/10.1258/1355819021927674">10.1258/1355819021927674</a>.</cite>

<cite>Hirschenhauser, K. and R. F. Oliveira
(2006).
&ldquo;Social Modulation of Androgens in Male Vertebrates: Meta-Analyses of the Challenge Hypothesis&rdquo;.
In: Animal Behaviour 71.2, pp. 265-277.
ISSN: 0003-3472.
DOI: <a href="https://doi.org/10.1016/j.anbehav.2005.04.014">10.1016/j.anbehav.2005.04.014</a>.</cite>

<cite>Ioannidis, J. P.
(2008).
&ldquo;Why Most Discovered True Associations Are Inflated&rdquo;.
In: Epidemiology 19.5, pp. 640-648.
ISSN: 1044-3983.</cite>
---
## More refs 7.

<cite>Jackson, D., M. Law, G. Rücker, et al.
(2017).
&ldquo;The Hartung-Knapp Modification for Random-Effects Meta-Analysis: A Useful Refinement but Are There Any Residual Concerns?&rdquo;
In: Statistics in Medicine 36.25, pp. 3923-3934.
DOI: <a href="https://doi.org/10.1002/sim.7411">10.1002/sim.7411</a>.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7411.</cite>

<cite>Jacobs, P. and W. Viechtbauer
(2016).
&ldquo;Estimation of the Biserial Correlation and Its Sampling Variance for Use in Meta-Analysis&rdquo;.
In: Research Synthesis Methods 8.2, pp. 161-180.
DOI: <a href="https://doi.org/10.1002/jrsm.1218">10.1002/jrsm.1218</a>.</cite>

<cite>Koricheva, J., J. Gurevitch, and K. Mengersen
(2013).
Handbook of Meta-Analysis in Ecology and Evolution.
Princeton, NJ: Princeton University Press.
ISBN: 0-691-13729-3.</cite>

<cite>Kovalchik, S.
(2013).
Tutorial On Meta-Analysis In R - R useR! Conference 2013.</cite>

<cite>Lipsey, M. W. and D. B. Wilson
(2001).
Practical Meta-Analysis.
London: SAGE publications, Inc.
ISBN: 0-7619-2167-2.</cite>

---
## More refs 8.

<cite>Littell, J. H., J. Corcoran, and V. Pillai
(2008).
Systematic Reviews and Meta-Analysis.
Oxford, UK: Oxford University Press.
ISBN: 0-19-532654-7.</cite>

<cite>McShane, B. B., U. Böckenholt, and K. T. Hansen
(2016).
&ldquo;Adjusting for Publication Bias in Meta-Analysis: An Evaluation of Selection Methods and Some Cautionary Notes&rdquo;.
In: Perspectives on Psychological Science 11.5, pp. 730-749.
DOI: <a href="https://doi.org/10.1177/1745691616662243">10.1177/1745691616662243</a>.
eprint: https://doi.org/10.1177/1745691616662243.</cite>

<cite>Mengersen, K., C. Schmidt, M. Jennions, et al.
(2013).
&ldquo;Statistical Models and Approaches to Inference&rdquo;.
In: 
Handbook of Meta-Analysis in Ecology and Evolution.
Ed. by Koricheva, J, J. Gurevitch and Mengersen, Kerrie.
Princeton, NJ: Princeton University Press, pp. 89-107.</cite>

<cite>Methley, A. M., S. Campbell, C. Chew-Graham, et al.
(2014).
&ldquo;PICO, PICOS and SPIDER: A Comparison Study of Specificity and Sensitivity in Three Search Tools for Qualitative Systematic Reviews&rdquo;.
Eng.
In: BMC health services research 14, pp. 579-579.
ISSN: 1472-6963.
DOI: <a href="https://doi.org/10.1186/s12913-014-0579-0">10.1186/s12913-014-0579-0</a>.</cite>

<cite>Morina, N., K. Stam, T. V. Pollet, et al.
(2018).
&ldquo;Prevalence of Depression and Posttraumatic Stress Disorder in Adult Civilian Survivors of War Who Stay in War-Afflicted Regions. A Systematic Review and Meta-Analysis of Epidemiological Studies&rdquo;.
In: Journal of Affective Disorders 239, pp. 328-338.
ISSN: 0165-0327.
DOI: <a href="https://doi.org/10.1016/j.jad.2018.07.027">10.1016/j.jad.2018.07.027</a>.</cite>

---
## More refs 9.

<cite>Nakagawa, S., D. W. A. Noble, A. M. Senior, et al.
(2017).
&ldquo;Meta-Evaluation of Meta-Analysis: Ten Appraisal Questions for Biologists&rdquo;.
In: BMC Biology 15.1, p. 18.
ISSN: 1741-7007.
DOI: <a href="https://doi.org/10.1186/s12915-017-0357-7">10.1186/s12915-017-0357-7</a>.</cite>

<cite>Pastor, D. A. and R. A. Lazowski
(2018).
&ldquo;On the Multilevel Nature of Meta-Analysis: A Tutorial, Comparison of Software Programs, and Discussion of Analytic Choices&rdquo;.
In: Multivariate Behavioral Research 53.1, pp. 74-89.
DOI: <a href="https://doi.org/10.1080/00273171.2017.1365684">10.1080/00273171.2017.1365684</a>.</cite>

<cite>Poole, C. and S. Greenland
(1999).
&ldquo;Random-Effects Meta-Analyses Are Not Always Conservative&rdquo;.
In: American Journal of Epidemiology 150.5, pp. 469-475.
ISSN: 0002-9262.
DOI: <a href="https://doi.org/10.1093/oxfordjournals.aje.a010035">10.1093/oxfordjournals.aje.a010035</a>.
eprint: http://oup.prod.sis.lan/aje/article-pdf/150/5/469/286690/150-5-469.pdf.</cite>

<cite>Popper, K.
(1959).
The Logic of Scientific Discovery.
London, UK: Hutchinson.
ISBN: 1-134-47002-9.</cite>

<cite>Roberts, P. D., G. B. Stewart, and A. S. Pullin
(2006).
&ldquo;Are Review Articles a Reliable Source of Evidence to Support Conservation and Environmental Management? A Comparison with Medicine&rdquo;.
In: Biological conservation 132.4, pp. 409-423.
ISSN: 0006-3207.</cite>

---
## More refs 10.

<cite>Rosenberg, M. S., H. R. Rothstein, and J. Gurevitch
(2013).
&ldquo;Effect Sizes: Conventional Choices and Calculations&rdquo;.
In: Handbook of Meta-analysis in Ecology and Evolution, pp. 61-71.</cite>

<cite>Röver, C., G. Knapp, and T. Friede
(2015).
&ldquo;Hartung-Knapp-Sidik-Jonkman Approach and Its Modification for Random-Effects Meta-Analysis with Few Studies&rdquo;.
In: BMC Medical Research Methodology 15.1, p. 99.
ISSN: 1471-2288.
DOI: <a href="https://doi.org/10.1186/s12874-015-0091-1">10.1186/s12874-015-0091-1</a>.</cite>

<cite>Schwarzer, G., J. R. Carpenter, and G. Rücker
(2015).
Meta-Analysis with R.
New York, NY: Springer.
ISBN: 3-319-21415-2.</cite>

<cite>Schwarzer, G., H. Chemaitelly, L. J. Abu-Raddad, et al.
&ldquo;Seriously Misleading Results Using Inverse of Freeman-Tukey Double Arcsine Transformation in Meta-Analysis of Single Proportions&rdquo;.
In: Research Synthesis Methods 0.0.
DOI: <a href="https://doi.org/10.1002/jrsm.1348">10.1002/jrsm.1348</a>.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1348.</cite>

<cite>Simmons, J. P., L. D. Nelson, and U. Simonsohn
(2011).
&ldquo;False-Positive Psychology&rdquo;.
In: Psychological Science 22.11, pp. 1359-1366.
ISSN: 0956-7976.
DOI: <a href="https://doi.org/10.1177/0956797611417632">10.1177/0956797611417632</a>.</cite>

---
## More refs 11.

<cite>Simonsohn, U., L. D. Nelson, and J. P. Simmons
(2014).
&ldquo;P-Curve: A Key to the File-Drawer.&rdquo;
In: Journal of Experimental Psychology: General 143.2, pp. 534-547.
ISSN: 1939-2222.
DOI: <a href="https://doi.org/10.1037/a0033242">10.1037/a0033242</a>.</cite>

<cite>Sterne, J. A. C., A. J. Sutton, J. P. A. Ioannidis, et al.
(2011).
&ldquo;Recommendations for Examining and Interpreting Funnel Plot Asymmetry in Meta-Analyses of Randomised Controlled Trials&rdquo;.
In: BMJ 343.jul22 1, pp. d4002-d4002.
ISSN: 0959-8138.
DOI: <a href="https://doi.org/10.1136/bmj.d4002">10.1136/bmj.d4002</a>.</cite>

<cite>Veroniki, A. A., D. Jackson, W. Viechtbauer, et al.
(2016).
&ldquo;Methods to Estimate the Between-Study Variance and Its Uncertainty in Meta-Analysis&rdquo;.
Eng.
In: Research synthesis methods 7.1, pp. 55-79.
ISSN: 1759-2887.
DOI: <a href="https://doi.org/10.1002/jrsm.1164">10.1002/jrsm.1164</a>.</cite>

<cite>Viechtbauer, W.
(2015).
&ldquo;Package &lsquo;metafor&rsquo;: Meta-Analysis Package for R&rdquo;.
</cite>

<cite>Weiss, B. and J. Daikeler
(2017).
Syllabus for Course: &ldquo;Meta-Analysis in Survey Methodology&quot;, 6th Summer Workshop (GESIS).</cite>

---
## More refs 12.

<cite>Wickham, H. and G. Grolemund
(2016).
R for Data Science.
Sebastopol, CA: O'Reilly..</cite>

<cite>Wiernik, B.
(2015).
A Brief Introduction to Meta-Analysis.</cite>

<cite>Wiksten, A., G. Rücker, and G. Schwarzer
(2016).
&ldquo;Hartung-Knapp Method Is Not Always Conservative Compared with Fixed-Effect Meta-Analysis&rdquo;.
In: Statistics in Medicine 35.15, pp. 2503-2515.
DOI: <a href="https://doi.org/10.1002/sim.6879">10.1002/sim.6879</a>.</cite>

<cite>Wingfield, J. C., R. E. Hegner, A. M. Dufty Jr, et al.
(1990).
&ldquo;The&quot; Challenge Hypothesis&quot;: Theoretical Implications for Patterns of Testosterone Secretion, Mating Systems, and Breeding Strategies&rdquo;.
In: American Naturalist 136, pp. 829-846.
ISSN: 0003-0147.</cite>

<cite>Yeaton, W. H. and P. M. Wortman
(1993).
&ldquo;On the Reliability of Meta-Analytic Reviews: The Role of Intercoder Agreement&rdquo;.
In: Evaluation Review 17.3, pp. 292-309.
ISSN: 0193-841X.
DOI: <a href="https://doi.org/10.1177/0193841X9301700303">10.1177/0193841X9301700303</a>.</cite>

---
## More refs 13.

---
## More refs 14.