Machine learning demonstrates that ecology is at least as important as culture in predicting societal complexity

class: center, middle, inverse, title-slide

# Machine learning demonstrates that ecology is at least as important as culture in predicting societal complexity
### Thomas Pollet, Northumbria University & Mathijs van Dijk, Erasmus University Rotterdam
### 2018-07-04 | <a href="http://tvpollet.github.io/disclaimer">disclaimer</a>

---

---
## The story behind this talk...

* I was at [NIAS](www.nias.nl) working on machine learning applications.

* [Mathijs van Dijk](https://nias.knaw.nl/fellow/dijk-mathijs-van/) is a Professor in Finance. He wants to understand the functions of financial systems for society.

---
## The importance of institutions (versus other factors) for economic complexity.

* Institutional economists (e.g., Nobel Prize winner [Douglass North](https://en.wikipedia.org/wiki/Douglass_North)) stress the importance of institutions for economic development.

* Similar questions addressed in anthropology (e.g., Service, 1975)

* Similar questions in sociology (e.g., functionalism of Parsons (1951), Lenksi's Ecological-Evolutionary Theory (1974/2006) - also see [Sanderson (1990)](https://books.google.nl/books?id=m7KpQgAACAAJ&hl=nl&source=gbs_ViewAPI&redir_esc=y)).

* What are the necessary preconditions for societal complexity / state formation?

---
## Ongoing related debate in anthropology?

* The relative importance of culture (social learning) versus ecology for behavioural variation.

---
## Research question

* Can we predict 'societal complexity' (or a proxy thereof)? How well does 'AI' do?

* Relative importance of 'ecology' vs. 'culture'. (Which is potentially quite a misnomer... .).

* Macro-economists emphasize the importance of institutions. What is the evidence for this?

---
## Ethnographic atlas

---
## Data cleaning

*  Technical variables (e.g., ID variable for the SCCS) were removed from the dataset.

*  Some variables which were derived from other variables.

* Some variables which proved to be problematic in a preliminary screening (for example due to a high percentage of missing cases or nominal variables with nearly unique codes).

* With and without 7 variables (_country_Codes, Area_Region, SubContinent_Region, Continent, Language_Continent, Old_New_Class, Region_)

* Two variables in our preliminary analyses (_‘Mean_Size_of_Local_Communities’_ and _‘Jurisdictional_Hierarchy_Beyond_Local_Community’_) --> overlap to a large degree with our dependent variable.

---
## Dependent Variable

* ‘settlement complexity’ (_Settlement_Patterns_).

* Treated as an ordinal variable with 8 categories in the key analyses. Continuous for some figures.

* “Fully migratory or nomadic bands / Separated hamlets where several such form a more or less permanent single community. / Neighborhoods of dispersed family homesteads / Seminomadic communities whose members wander in bands for at least half of the year but occupy a fixed settlement at some season or seasons, e.g., recurrently occupied winter quarters / Semisedentary communities whose members shift from one to another fixed settlement at different seasons or who occupy more  or less permanently a single settlement from which a substantial proportion of the population departs seasonally to occupy shifting camps, e.g., during transhumance. + Compact and relatively permanent settlements, i.e., nucleated villages or towns. / Compact but impermanent settlements, i.e., villages whose location is shifted every few years. / Complex settlements consisting of a nucleated village or town with outlying homesteads or satellite hamlets.”

---
## Coding of predictors

* Broad categories: Culture / Ecology / none (e.g., Language class)

* Narrower categories: Culture: Culture / Culture: Ecology / Ecology: Climate / Ecology: Geography / Ecology: Both

* More narrow coding of cultural variables. (none= not a cultural variable).

---
## Method: Ctree (party package)

"Conditional inference trees estimate a regression relationship by binary recursive partitioning in a conditional inference framework. Roughly, the algorithm works as follows:

1) Test the global null hypothesis of independence between any of the input variables and the response (which may be multivariate as well). Stop if this hypothesis cannot be rejected. Otherwise select the input variable with strongest association to the response. This association is measured by a p-value corresponding to a test for the partial null hypothesis of a single input variable and the response.

2) Implement a binary split in the selected input variable.

3) Recursively repeate steps 1) and 2)."  [Hothorn et al., 2018: 8](https://cran.r-project.org/web/packages/party/party.pdf)

---
## Method: benefits?

Some key benefits:

* automatically detects interactions and non-linearities

* no overfitting (no pruning)

* BUT: trees are 'truly random' --> Forests.

---
## (Conditional) Random Forests via Party

* Build a large number of these trees! (N=500).

* A commonly used tool for small N large P problems. (137 predictors! , (combination of) 30 considered at each split (['mtry'](https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf)))

* Analyses suggest that random forest  models are on a par or regularly outperform other machine learning methods (e.g., [Karuna et al. 2008](http://lowrank.net/nikos/pubs/empirical.pdf)). [Lay explanation](https://medium.com/rants-on-machine-learning/the-unreasonable-effectiveness-of-random-forests-f33c3ce28883)

* We used the ['party' package](https://cran.r-project.org/web/packages/party/party.pdf) in R.

---
## Variable importance

"More precisely, it measures the difference between the OOB error rate after and before permuting the values of the predictor of interest. ...  The idea underlying this VIM is the following: If the predictor is not associated with the response, the permutation of its values has no influence on the classification, and thus also no influence on the error rate. The error rate of the forest is not substantially affected by the permutation and the VI of the predictor takes a value close to zero, indicating no association between the predictor and the response. In contrast, if response and predictor are associated, the permutation of the predictor values destroys this association. **“Knocking out” this predictor by permuting its values results in a worse classification leading to an increased error rate.** The difference in error rates before and after randomly permuting the predictor thus takes a positive value reflecting the high importance of this predictor" Janitza et al. 2013, _BMC Bioinformatics_: p. 3

More information in the [party manual](https://cran.r-project.org/web/packages/party/party.pdf) and [here](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-307) and [here](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-119)

---
## [Map](https://tvpollet.github.io/HBES2018/Figure1-19-5-edited.png)

---
## Performance: Spearman's `\(\rho\)`

* The Spearman `\(\rho\)` correlations between the observed and predicted values were .815 for run 1 and .812 for run 2 of the full model.

* For the reduced models, these correlations were .810 for run 1 and .808 for run 2, respectively.

* Is that good or bad?

---
## Performance: Bland-Altman plot

---
## Robustness checks

* Different starting seeds yield similar results. (Correspondence between two runs for variable importance was very high (Full: Pearson r= .997, Spearman `\(\rho\)`= .938; Reduced: Pearson r= .996, Spearman `\(\rho\)`= .927)

* Varied 'mtry' (e.g., 46) and 'ntrees' (e.g., N=1,000) qualitatively same results

---
## Results: [Variable importance](https://tvpollet.github.io/HBES2018/Figure3-panelled-19-5-edited.png)

---
## Results: A closer look at variable importance (top 10).

---
## Grouping by broad categories

---
## Grouping by narrower categories

---
## Grouping by narrower categories within culture

---
## What does it all mean?

* Forests are not really interpretable. Remains a black box.

* A sample tree --> note exploratory. (Alternative: partial dependence plots).

<div class="figure" style="text-align: center">
<img src="machine_learning.png" alt="https://www.explainxkcd.com/wiki/index.php/1838:_Machine_Learning" width="300px" />
<p class="caption">https://www.explainxkcd.com/wiki/index.php/1838:_Machine_Learning</p>
</div>

---
## Results: [Sample tree](https://tvpollet.github.io/HBES2018/Figure6A-19-5.pdf)

---
## Conclusion

* Subsistence economy/agriculture best predictors. Unsurprising conclusion perhaps?

* Property rights and 'institutions' (or proxies thereof) have relatively little predictive influence.

* 'Ecology' is on a par with 'culture' (Provided we can meaningfully disentangle).

* Random Forest is useful tool in our toolbox?

---
## Limitations

* Are agriculture/subsistence economy just the 'same thing' as settlement complexity?

* 'Garbage in, garbage out'.

* Problems with modelling in these types of datasets (see [Towner et al., 2016](http://rspb.royalsocietypublishing.org/content/283/1826/20152184?ijkey=c5902d80f0e53946580d69674d101b1d17531f32&keytype2=tf_ipsecsha))

---
## Limitations

* Lots of issues with this type of research. Perhaps useful summary from Lee (1984: 523) _Journal of Family Issues_

---
## Future work

* Machine learning challenge? How do other methods do (support vector machines)?

* A potential reanalysis of [WNAI-D](https://github.com/D-PLACE/dplace-data/tree/master/datasets/WNAI)? Analysis of [Seshat](http://seshatdatabank.info/)?

* ['Qualitative Comparative Analysis'](https://cran.r-project.org/web/packages/QCA/QCA.pdf). (necessary and sufficient conditions for causal inference)

* Phylogenetic work (Mathijs van Dijk was working with Andrew Meade)

---
##Any Questions?

[http://tvpollet.github.io](http://tvpollet.github.io)

Twitter: @tvpollet

---
##Acknowledgments

* Funded by [NIAS](http://nias.knaw.nl) (and in past by [NWO](www.nwo.nl), [Templeton](www.templeton.org)).

* slides made with [xaringan](https://github.com/yihui/xaringan).

* You for listening!

---
## Why no cross-validation?

**The out-of-bag (oob) error estimate**:
"In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:

Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.

Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was oob. The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests."

---
## Overfitting / Why no cross-validation (bis)?

* This approach does not overfit.

*  "This statistical approach ensures that the right sized tree is grown and no form of pruning or cross-validation or whatsoever is needed." (from [here](https://cran.r-project.org/web/packages/party/party.pdf))

---
## Results: Agriculture

<img src="Figure5A-19-5.png" width="800px" style="display: block; margin: auto;" />
 
---
## Results: Subsistence economy

---
## Labels: coding

Up for debate obviously. [Download Here](https://tvpollet.github.io/HBES2018/labels.xlsx)