Preference graphs: A tool to help select pre-rated stimuli?

class: center, middle, inverse, title-slide

.title[
# Preference graphs: A tool to help select pre-rated stimuli?
]
.author[
### Prof. Thomas Pollet, Northumbria University (<a href="mailto:thomas.pollet@northumbria.ac.uk" class="email">thomas.pollet@northumbria.ac.uk</a>)
]
.date[
### 2026-02-25 | <a href="http://tvpollet.github.io/disclaimer">disclaimer</a>
]

---

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(NU2.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

## Outline

* Minimal theory - just framing the problem... .

* OMG --> GRAPHS --> Look at the Shiny-Shiny

* The end

---
## Small goals...

This is a very small project dealing with a rather specific problem.

--> Selecting stimuli based on some criterion

--> For some other problems --> e.g. Pseudoreplication: see <a name=cite-Bovet2022></a>[Bovet, Tognetti, and Pollet (2022)](https://doi.org/10.1017/ehs.2022.25)

---
## Stimuli selection

* Many designs --> some form stimuli selection (e.g., "attractive vs. unattractive" faces)

* For example, <a name=cite-Massar2010></a>[Massar and Buunk (2010)](https://doi.org/10.1016/j.paid.2010.05.037) used an attractive stimulus versus unattractive stimulus in a priming paradigm.

* Vignettes / "Base faces" which are then morphed / Priming ...

---
## However, also relevant to other designs...

* Some designs examine 'attractive' people and then look at correlates. Some examples : <a name=cite-Jokela2009b></a><a name=cite-Kalick1998></a>([Jokela, 2009b](#bib-Jokela2009b); [Kalick, Zebrowitz, Langlois, and Johnson, 1998](#bib-Kalick1998))

* Not limited to evolutionary psychology... . Designs are common in cognitive, developmental, and social psychology.

---
## Pre-rated stimuli

* Traits (attractiveness, health, etc.)

* No real information or irreproducible information (e.g., pre-test data classified into categories).

For example, <a name=cite-Jung2012></a>[Jung, Ruthruff, Tybur, Gaspelin, and Miller (2012)](https://doi.org/10.1016/j.evolhumbehav.2011.10.001)  _"The original picture pool consisted of 49 male faces. Based on ratings of attractiveness in a pilot study (not shown here), we selected six pictures for each of the following four attractiveness levels: very attractive, somewhat attractive, somewhat unattractive, and very unattractive."_ (p. 244)

Focus on three rule sets:

* Mean±SD rule <a name=cite-Langlois1990></a>([Langlois and Roggman, 1990](https://doi.org/10.1111/j.1467-9280.1990.tb00079.x))
* top % rule ([Kalick, Zebrowitz, Langlois et al., 1998](#bib-Kalick1998))
* top n rule <a name=cite-Little2012b></a>([Little, Roberts, Jones, and DeBruine, 2012](https://doi.org/10.1080/17470218.2012.677048))

???
Other rules: Median split

---
## Limitations of selection rules : I

* **Selection Rules can be misleading** : Rely on rating distributions and may not reflect true clustering of preferences or participant agreement.

--> Can mask small, highly-preferred sets or falsely inflate selection numbers (no true difference).

--> Also implicit is a symmetry. BUT: Agreement could be greater for low-rated as opposed to high-rated stimuli (or vice versa).

---
## Limitations of selection rules: II

* **Reversal Paradoxes**: Any form of aggregation: ratings can lead to counterintuitive results.

--> If we look at a stimulus pair, there could be a mean difference A > B in rating -- BUT the majority actually prefer B > A.

-–> especially plausible in diverse samples. But: even seemingly homogeneous groups can exhibit this; consensus on attractiveness is not guaranteed, and selected stimuli are not necessarily interchangeable.

---
## Extremely simple simulation...

* You can ask me more about simulations if we have time (_n_ = 1,000 sims, 20 faces, 60 raters).

* The model has some confounding and some selective use of the scale can give rise to reversals (in this case a swapping of which stimulus you would have picked based on one rule versus another).

* There is some protection from Mean/SD rule but you still find some reversals.

* You would have a picked another face if you compared head-to-head (majority rule) versus aggregate rule. --> it is at least plausible that you can observe reversals.

---
## Simulation results

---
## More fundamentally: No consideration of agreement.

The rule based heuristics employed do **not** capture agreement in any meaningful way... .

???
And as we will show if you employ a Mean +/- SD heuristic then

---
## An illustration of a potential solution

* London Face Research Database <a name=cite-DeBruine2017></a>([DeBruine and Jones, 2017](https://figshare.com/articles/Face{_}Research{_}Lab{_}London{_}Set/5047666 https://ndownloader.figshare.com/files/8541955 https://ndownloader.figshare.com/files/8541961 https://ndownloader.figshare.com/files/8541964 https://ndownloader.figshare.com/files/8542042))

* 1-7 Likert rating for each stimulus (102 faces) for 2,513 raters.

* Two faces (ID's 031 and 135) excluded from the analyses (no age provided).

* Mimic a "typical" evolutionary psychology design: limited age range (18-35 year old for both stimulus and rater) and focussed on heterosexual preferences

* 499 men rating 41 women / 838 women rating 44 men

---
## Mean and SD rule in this set

* Men rating women: only 1 high rated stimulus (ID 124).

* Women: NO STIMULI

---
## Graphs!

My potential solution to some of this: GRAPHS.

---
## Preference graphs

* **Participant-Specific Preference Graphs**: We built a graph for each person, showing their face preferences.

* **Edges Represent Dominance**: Arrows point from the less preferred to the more preferred face (based on ratings).

* **Focus on Ranking**: For now: we care whether a preference existed, not the size of the rating difference. --> will return to this point.

---
## Consensus graphs

* Individual preference graphs were combined by calculating the proportion of participants showing each preference (edge).

* **Thresholding for Consensus**: We retained edges supported by ≥ 66.7% or ≥ 80% of participants. <a name=cite-Krippendorff2004></a>([Krippendorff, 2004](#bib-Krippendorff2004))

* **"Sink" Nodes**: Nodes with only incoming edges.

* **"Source" Nodes**: Nodes with only outgoing edges represent stimuli consistently not preferred.

* **Visualization**: Node size reflects the degree (number of connections) to highlight overall preference strength.

---
## Permutation Test (Consensus vs. Chance):

* **Shuffled Ratings**: We randomly re-assigned which faces each participant rated, keeping their overall rating patterns the same.

* **Repeated Calculation**: We recalculated consensus edges with these shuffled ratings 10,000 times to create a "null" distribution. --> This tells us what consensus looks like by chance... .

---
## Bootstrap Confidence Intervals (Edge uncertainty of consensus graph):

* Resampled Participants: We repeatedly re-sampled participants with replacement (1,000 times).

* Edge Weight Intervals: This gave us confidence intervals for how reliably we estimated the strength of each edge in the consensus graph.

---
## Split-Half Reliability (Ranking Stability):

* Divided Participants: We split participants randomly into two groups.

* Compared Rankings: We built consensus networks for each group and correlated the resulting face rankings (Spearman's ρ, repeated 1,000 times).

--> This shows how stable the overall preference order is.

---
## Implementation

* This is all done in R with `tidyverse`, `igraph`, and `bootnet`

* Made some custom functions -- thanks, Claude -- so that others can do this.

---
## Results: Men rating women - 66.7%

---
## Results: Men rating women - 80%

---
## Results: Women rating men - 66.7%

---
## Results: Women rating men - 80%

---
## Some checks

* **Permutation tests** : All _p_ < .0001 . (Unsurprising as we have the threshold)

* **Bootstrapping** : Edge based metrics. The largest confidence interval widths found were .08,.07,.07 and .05 (Men rating women at 66.7% and 80%; Women rating men at 66.7% and 80% respectively).

--> Example: If we had selected the edge between stimulus 062 and stimulus 083 the estimate of .671 has a 95% CI ranging from .629 to .713.

--> Could use this to enforce a stricter threshold.

* **Split-half reliability**: All ρ for rankings >.98

---
## Interim summary

* Our approach allows us to pick some stimuli for which we can make claims. For example, in a rating study >66.7% of the participants preferred the 'attractive' over the 'unattractive' stimulus.

* We are not saying this is THE way - rather it might be a tool to make some principled decision.

--> In an ideal world: pre-register the decision on how you will select your stimuli.

---
## Extension: Weights

**Flexibility**: Our method can incorporate the size of rating differences (e.g., require a 2-point difference instead of 1).

**Weighting by Difference**: Included a function to weight preferences based on the magnitude of rating differences.

**Combined Preference Score**: Preference strength is calculated as: proportion of raters with tie × mean difference size (prioritizing widespread and decisive preferences -- but note scale dependent).

**Custom Weighting Options**: More complex weighting schemes are possible (e.g., w1 \* proportion + w2 \* mean_weight).

**Simple Thresholding Sufficient**: For many cases, specifying a larger minimum difference threshold is enough; complex weighting isn't always needed.

---
## Pretty animation example (corresponding to very first analysis)

---
## Where does this leave us?

* A minuscule contribution to a small problem -- which mostly bothered me when thinking about stimuli selection. This approach could potentially lead to less arbitrary selections?

* Several limitations (e.g., choice paralysis)

* Other applications? Extensions (tournaments between stimuli...)?

---
## Acknowledgments

**AI declaration**:  The author(s) have made use of AI tools to assist with R code ([Claude Opus 4.5](https://www.anthropic.com/news/claude-opus-4-5)) and to help improve the presentation ([Gemma 3 27b](https://lmstudio.ai/models/google/gemma-3-27b)). In all cases, the author(s) made the final decision and take(s) full responsibility.

Slides were built with [Xaringan](https://github.com/yihui/xaringan) and [XaringanExtra](https://pkg.garrickadenbuie.com/xaringanExtra/). Many thanks to those building packages and supporting the [R](https://www.r-project.org/) eco-system.

You for listening!

---
## References and further reading (errors = blame RefManageR)

<a name=bib-Bovet2022></a>[Bovet, J., A. Tognetti, and T. V.
Pollet](#cite-Bovet2022) (2022). "Methodological Issues When Using Face
Prototypes: A Case Study on the Faceaurus Dataset". In: _Evolutionary
Human Sciences_ 4, p. e48. DOI:
[10.1017/ehs.2022.25](https://doi.org/10.1017%2Fehs.2022.25).

<a name=bib-DeBruine2017></a>[DeBruine, L. and B.
Jones](#cite-DeBruine2017) (2017). "Face Research Lab London Set".

DOI:
[10.6084/m9.figshare.5047666.v3](https://doi.org/10.6084%2Fm9.figshare.5047666.v3).
URL:
[https://figshare.com/articles/Face_Research_Lab_London_Set/5047666
https://ndownloader.figshare.com/files/8541955
https://ndownloader.figshare.com/files/8541961
https://ndownloader.figshare.com/files/8541964
https://ndownloader.figshare.com/files/8542042](https://figshare.com/articles/Face_Research_Lab_London_Set/5047666
https://ndownloader.figshare.com/files/8541955
https://ndownloader.figshare.com/files/8541961
https://ndownloader.figshare.com/files/8541964
https://ndownloader.figshare.com/files/8542042).

<a name=bib-Jokela2009b></a>[Jokela, M.](#cite-Jokela2009b) (2009b).
"Physical Attractiveness and Reproductive Success in Humans: Evidence
from the Late 20th Century United States". In: _Evolution and Human
Behavior_ 30.5, pp. 342-350. ISSN: 1090-5138.

---
## More references

<a name=bib-Jung2012></a>[Jung, K., E. Ruthruff, J. M. Tybur, et
al.](#cite-Jung2012) (2012). "Perception of Facial Attractiveness
Requires Some Attentional Resources: Implications for the
“Automaticity” of Psychological Adaptations". In: _Evolution and Human
Behavior_ 33.3, pp. 241-250. ISSN: 1090-5138. DOI:
[10.1016/j.evolhumbehav.2011.10.001](https://doi.org/10.1016%2Fj.evolhumbehav.2011.10.001).

<a name=bib-Kalick1998></a>[Kalick, S. M., L. A. Zebrowitz, J. H.
Langlois, et al.](#cite-Kalick1998) (1998). "Does Human Facial
Attractiveness Honestly Advertise Health? Longitudinal Data on an
Evolutionary Question". In: _Psychological science_ 9.1, pp. 8-13.
ISSN: 0956-7976.

<a name=bib-Krippendorff2004></a>[Krippendorff,
K.](#cite-Krippendorff2004) (2004). _Content Analysis: An Introduction
to Its Methodology_. 2nd ed. Sage publications. ISBN: 1-5063-9567-8.

<a name=bib-Langlois1990></a>[Langlois, J. H. and L. A.
Roggman](#cite-Langlois1990) (1990). "Attractive Faces Are Only
Average". In: _Psychological science_ 1.2, pp. 115-121. ISSN:
0956-7976. DOI:
[10.1111/j.1467-9280.1990.tb00079.x](https://doi.org/10.1111%2Fj.1467-9280.1990.tb00079.x).

---
## More references 2

<a name=bib-Little2012b></a>[Little, A. C., S. C. Roberts, B. C. Jones,
et al.](#cite-Little2012b) (2012). "The Perception of Attractiveness
and Trustworthiness in Male Faces Affects Hypothetical Voting Decisions
Differently in Wartime and Peacetime Scenarios". In: _The Quarterly
Journal of Experimental Psychology_ 65.10, pp. 2018-2032. DOI:
[10.1080/17470218.2012.677048](https://doi.org/10.1080%2F17470218.2012.677048).
eprint: https://doi.org/10.1080/17470218.2012.677048. URL:
[https://doi.org/10.1080/17470218.2012.677048](https://doi.org/10.1080/17470218.2012.677048).

<a name=bib-Massar2010></a>[Massar, K. and A. P.
Buunk](#cite-Massar2010) (2010). "Judging a Book by Its Cover: Jealousy
after Subliminal Priming with Attractive and Unattractive Faces". In:
_Personality and individual differences_ 49.6, pp. 634-638. ISSN:
0191-8869. DOI:
[10.1016/j.paid.2010.05.037](https://doi.org/10.1016%2Fj.paid.2010.05.037).