class: center, middle, inverse, title-slide .title[ # Preference graphs: A tool to help select pre-rated stimuli? ] .author[ ### Prof. Thomas Pollet, Northumbria University (
thomas.pollet@northumbria.ac.uk
) ] .date[ ### 2026-02-25 |
disclaimer
] ---
## Outline * Minimal theory - just framing the problem... . -- * OMG --> GRAPHS --> Look at the Shiny-Shiny -- * The end <img src="summary.gif" alt="" width="550px" style="display: block; margin: auto;" /> --- ## Small goals... This is a very small project dealing with a rather specific problem. -- --> Selecting stimuli based on some criterion -- --> For some other problems --> e.g. Pseudoreplication: see <a name=cite-Bovet2022></a>[Bovet, Tognetti, and Pollet (2022)](https://doi.org/10.1017/ehs.2022.25) <img src="puffin.jpg" alt="" width="400px" style="display: block; margin: auto;" /> --- ## Stimuli selection * Many designs --> some form stimuli selection (e.g., "attractive vs. unattractive" faces) -- * For example, <a name=cite-Massar2010></a>[Massar and Buunk (2010)](https://doi.org/10.1016/j.paid.2010.05.037) used an attractive stimulus versus unattractive stimulus in a priming paradigm. -- * Vignettes / "Base faces" which are then morphed / Priming ... <img src="stimulus.gif" alt="" width="500px" style="display: block; margin: auto;" /> --- ## However, also relevant to other designs... * Some designs examine 'attractive' people and then look at correlates. Some examples : <a name=cite-Jokela2009b></a><a name=cite-Kalick1998></a>([Jokela, 2009b](#bib-Jokela2009b); [Kalick, Zebrowitz, Langlois, and Johnson, 1998](#bib-Kalick1998)) -- * Not limited to evolutionary psychology... . Designs are common in cognitive, developmental, and social psychology. <img src="brain_explosion.gif" alt="" width="300px" style="display: block; margin: auto;" /> --- ## Pre-rated stimuli * Traits (attractiveness, health, etc.) -- * No real information or irreproducible information (e.g., pre-test data classified into categories). -- For example, <a name=cite-Jung2012></a>[Jung, Ruthruff, Tybur, Gaspelin, and Miller (2012)](https://doi.org/10.1016/j.evolhumbehav.2011.10.001) _"The original picture pool consisted of 49 male faces. Based on ratings of attractiveness in a pilot study (not shown here), we selected six pictures for each of the following four attractiveness levels: very attractive, somewhat attractive, somewhat unattractive, and very unattractive."_ (p. 244) -- Focus on three rule sets: * Mean±SD rule <a name=cite-Langlois1990></a>([Langlois and Roggman, 1990](https://doi.org/10.1111/j.1467-9280.1990.tb00079.x)) * top % rule ([Kalick, Zebrowitz, Langlois et al., 1998](#bib-Kalick1998)) * top n rule <a name=cite-Little2012b></a>([Little, Roberts, Jones, and DeBruine, 2012](https://doi.org/10.1080/17470218.2012.677048)) ??? Other rules: Median split --- ## Limitations of selection rules : I * **Selection Rules can be misleading** : Rely on rating distributions and may not reflect true clustering of preferences or participant agreement. -- --> Can mask small, highly-preferred sets or falsely inflate selection numbers (no true difference). -- --> Also implicit is a symmetry. BUT: Agreement could be greater for low-rated as opposed to high-rated stimuli (or vice versa). <img src="limitations.gif" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Limitations of selection rules: II * **Reversal Paradoxes**: Any form of aggregation: ratings can lead to counterintuitive results. -- --> If we look at a stimulus pair, there could be a mean difference A > B in rating -- BUT the majority actually prefer B > A. -- -–> especially plausible in diverse samples. But: even seemingly homogeneous groups can exhibit this; consensus on attractiveness is not guaranteed, and selected stimuli are not necessarily interchangeable. <img src="reverse.gif" alt="" width="300px" style="display: block; margin: auto;" /> --- ## Extremely simple simulation... * You can ask me more about simulations if we have time (_n_ = 1,000 sims, 20 faces, 60 raters). -- * The model has some confounding and some selective use of the scale can give rise to reversals (in this case a swapping of which stimulus you would have picked based on one rule versus another). -- * There is some protection from Mean/SD rule but you still find some reversals. -- * You would have a picked another face if you compared head-to-head (majority rule) versus aggregate rule. --> it is at least plausible that you can observe reversals. --- ## Simulation results <img src="reversal_sims.png" alt="" width="800px" style="display: block; margin: auto;" /> --- ## More fundamentally: No consideration of agreement. The rule based heuristics employed do **not** capture agreement in any meaningful way... . <img src="agreement.gif" alt="" width="600px" style="display: block; margin: auto;" /> ??? And as we will show if you employ a Mean +/- SD heuristic then --- ## An illustration of a potential solution * London Face Research Database <a name=cite-DeBruine2017></a>([DeBruine and Jones, 2017](https://figshare.com/articles/Face{_}Research{_}Lab{_}London{_}Set/5047666 https://ndownloader.figshare.com/files/8541955 https://ndownloader.figshare.com/files/8541961 https://ndownloader.figshare.com/files/8541964 https://ndownloader.figshare.com/files/8542042)) -- * 1-7 Likert rating for each stimulus (102 faces) for 2,513 raters. -- * Two faces (ID's 031 and 135) excluded from the analyses (no age provided). -- * Mimic a "typical" evolutionary psychology design: limited age range (18-35 year old for both stimulus and rater) and focussed on heterosexual preferences -- * 499 men rating 41 women / 838 women rating 44 men <img src="ratings.gif" alt="" width="250px" style="display: block; margin: auto;" /> --- ## Mean and SD rule in this set * Men rating women: only 1 high rated stimulus (ID 124). -- * Women: NO STIMULI <img src="oh_geez.gif" alt="" width="300px" style="display: block; margin: auto;" /> --- ## Graphs! My potential solution to some of this: GRAPHS. <img src="graphs.gif" alt="" width="600px" style="display: block; margin: auto;" /> --- ## Preference graphs * **Participant-Specific Preference Graphs**: We built a graph for each person, showing their face preferences. -- * **Edges Represent Dominance**: Arrows point from the less preferred to the more preferred face (based on ratings). -- * **Focus on Ranking**: For now: we care whether a preference existed, not the size of the rating difference. --> will return to this point. <img src="preference.gif" alt="" width="300px" style="display: block; margin: auto;" /> --- ## Consensus graphs * Individual preference graphs were combined by calculating the proportion of participants showing each preference (edge). -- * **Thresholding for Consensus**: We retained edges supported by ≥ 66.7% or ≥ 80% of participants. <a name=cite-Krippendorff2004></a>([Krippendorff, 2004](#bib-Krippendorff2004)) -- * **"Sink" Nodes**: Nodes with only incoming edges. -- * **"Source" Nodes**: Nodes with only outgoing edges represent stimuli consistently not preferred. -- * **Visualization**: Node size reflects the degree (number of connections) to highlight overall preference strength. <img src="consensus.gif" alt="" width="220px" style="display: block; margin: auto;" /> --- ## Permutation Test (Consensus vs. Chance): * **Shuffled Ratings**: We randomly re-assigned which faces each participant rated, keeping their overall rating patterns the same. -- * **Repeated Calculation**: We recalculated consensus edges with these shuffled ratings 10,000 times to create a "null" distribution. --> This tells us what consensus looks like by chance... . <img src="shuffle.gif" alt="" width="350px" style="display: block; margin: auto;" /> --- ## Bootstrap Confidence Intervals (Edge uncertainty of consensus graph): * Resampled Participants: We repeatedly re-sampled participants with replacement (1,000 times). -- * Edge Weight Intervals: This gave us confidence intervals for how reliably we estimated the strength of each edge in the consensus graph. <img src="bootstrap.gif" alt="" width="350px" style="display: block; margin: auto;" /> --- ## Split-Half Reliability (Ranking Stability): * Divided Participants: We split participants randomly into two groups. -- * Compared Rankings: We built consensus networks for each group and correlated the resulting face rankings (Spearman's ρ, repeated 1,000 times). --> This shows how stable the overall preference order is. <img src="split_half.gif" alt="" width="300px" style="display: block; margin: auto;" /> --- ## Implementation * This is all done in R with `tidyverse`, `igraph`, and `bootnet` -- * Made some custom functions -- thanks, Claude -- so that others can do this. <img src="implementation.gif" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Results: Men rating women - 66.7% <img src="plot_m_f_67.png" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Results: Men rating women - 80% <img src="plot_m_f_80.png" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Results: Women rating men - 66.7% <img src="plot_f_m_67.png" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Results: Women rating men - 80% <img src="plot_f_m_80.png" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Some checks * **Permutation tests** : All _p_ < .0001 . (Unsurprising as we have the threshold) -- * **Bootstrapping** : Edge based metrics. The largest confidence interval widths found were .08,.07,.07 and .05 (Men rating women at 66.7% and 80%; Women rating men at 66.7% and 80% respectively). -- --> Example: If we had selected the edge between stimulus 062 and stimulus 083 the estimate of .671 has a 95% CI ranging from .629 to .713. --> Could use this to enforce a stricter threshold. -- * **Split-half reliability**: All ρ for rankings >.98 --- ## Interim summary * Our approach allows us to pick some stimuli for which we can make claims. For example, in a rating study >66.7% of the participants preferred the 'attractive' over the 'unattractive' stimulus. -- * We are not saying this is THE way - rather it might be a tool to make some principled decision. --> In an ideal world: pre-register the decision on how you will select your stimuli. <img src="way.gif" alt="" width="500px" style="display: block; margin: auto;" /> --- ## Extension: Weights **Flexibility**: Our method can incorporate the size of rating differences (e.g., require a 2-point difference instead of 1). -- **Weighting by Difference**: Included a function to weight preferences based on the magnitude of rating differences. -- **Combined Preference Score**: Preference strength is calculated as: proportion of raters with tie × mean difference size (prioritizing widespread and decisive preferences -- but note scale dependent). -- **Custom Weighting Options**: More complex weighting schemes are possible (e.g., w1 \* proportion + w2 \* mean_weight). -- **Simple Thresholding Sufficient**: For many cases, specifying a larger minimum difference threshold is enough; complex weighting isn't always needed. --- ## Pretty animation example (corresponding to very first analysis) <img src="threshold_sweep_67.gif" alt="" width="575px" style="display: block; margin: auto;" /> --- ## Where does this leave us? * A minuscule contribution to a small problem -- which mostly bothered me when thinking about stimuli selection. This approach could potentially lead to less arbitrary selections? -- * Several limitations (e.g., choice paralysis) -- * Other applications? Extensions (tournaments between stimuli...)? <img src="lost.gif" alt="" width="575px" style="display: block; margin: auto;" /> --- ## Acknowledgments **AI declaration**: The author(s) have made use of AI tools to assist with R code ([Claude Opus 4.5](https://www.anthropic.com/news/claude-opus-4-5)) and to help improve the presentation ([Gemma 3 27b](https://lmstudio.ai/models/google/gemma-3-27b)). In all cases, the author(s) made the final decision and take(s) full responsibility. Slides were built with [Xaringan](https://github.com/yihui/xaringan) and [XaringanExtra](https://pkg.garrickadenbuie.com/xaringanExtra/). Many thanks to those building packages and supporting the [R](https://www.r-project.org/) eco-system. You for listening! <img src="https://media.giphy.com/media/10avZ0rqdGFyfu/giphy.gif" alt="" width="400px" style="display: block; margin: auto;" /> --- ## References and further reading (errors = blame RefManageR) <a name=bib-Bovet2022></a>[Bovet, J., A. Tognetti, and T. V. Pollet](#cite-Bovet2022) (2022). "Methodological Issues When Using Face Prototypes: A Case Study on the Faceaurus Dataset". In: _Evolutionary Human Sciences_ 4, p. e48. DOI: [10.1017/ehs.2022.25](https://doi.org/10.1017%2Fehs.2022.25). <a name=bib-DeBruine2017></a>[DeBruine, L. and B. Jones](#cite-DeBruine2017) (2017). "Face Research Lab London Set". DOI: [10.6084/m9.figshare.5047666.v3](https://doi.org/10.6084%2Fm9.figshare.5047666.v3). URL: [https://figshare.com/articles/Face_Research_Lab_London_Set/5047666 https://ndownloader.figshare.com/files/8541955 https://ndownloader.figshare.com/files/8541961 https://ndownloader.figshare.com/files/8541964 https://ndownloader.figshare.com/files/8542042](https://figshare.com/articles/Face_Research_Lab_London_Set/5047666 https://ndownloader.figshare.com/files/8541955 https://ndownloader.figshare.com/files/8541961 https://ndownloader.figshare.com/files/8541964 https://ndownloader.figshare.com/files/8542042). <a name=bib-Jokela2009b></a>[Jokela, M.](#cite-Jokela2009b) (2009b). "Physical Attractiveness and Reproductive Success in Humans: Evidence from the Late 20th Century United States". In: _Evolution and Human Behavior_ 30.5, pp. 342-350. ISSN: 1090-5138. --- ## More references <a name=bib-Jung2012></a>[Jung, K., E. Ruthruff, J. M. Tybur, et al.](#cite-Jung2012) (2012). "Perception of Facial Attractiveness Requires Some Attentional Resources: Implications for the “Automaticity” of Psychological Adaptations". In: _Evolution and Human Behavior_ 33.3, pp. 241-250. ISSN: 1090-5138. DOI: [10.1016/j.evolhumbehav.2011.10.001](https://doi.org/10.1016%2Fj.evolhumbehav.2011.10.001). <a name=bib-Kalick1998></a>[Kalick, S. M., L. A. Zebrowitz, J. H. Langlois, et al.](#cite-Kalick1998) (1998). "Does Human Facial Attractiveness Honestly Advertise Health? Longitudinal Data on an Evolutionary Question". In: _Psychological science_ 9.1, pp. 8-13. ISSN: 0956-7976. <a name=bib-Krippendorff2004></a>[Krippendorff, K.](#cite-Krippendorff2004) (2004). _Content Analysis: An Introduction to Its Methodology_. 2nd ed. Sage publications. ISBN: 1-5063-9567-8. <a name=bib-Langlois1990></a>[Langlois, J. H. and L. A. Roggman](#cite-Langlois1990) (1990). "Attractive Faces Are Only Average". In: _Psychological science_ 1.2, pp. 115-121. ISSN: 0956-7976. DOI: [10.1111/j.1467-9280.1990.tb00079.x](https://doi.org/10.1111%2Fj.1467-9280.1990.tb00079.x). --- ## More references 2 <a name=bib-Little2012b></a>[Little, A. C., S. C. Roberts, B. C. Jones, et al.](#cite-Little2012b) (2012). "The Perception of Attractiveness and Trustworthiness in Male Faces Affects Hypothetical Voting Decisions Differently in Wartime and Peacetime Scenarios". In: _The Quarterly Journal of Experimental Psychology_ 65.10, pp. 2018-2032. DOI: [10.1080/17470218.2012.677048](https://doi.org/10.1080%2F17470218.2012.677048). eprint: https://doi.org/10.1080/17470218.2012.677048. URL: [https://doi.org/10.1080/17470218.2012.677048](https://doi.org/10.1080/17470218.2012.677048). <a name=bib-Massar2010></a>[Massar, K. and A. P. Buunk](#cite-Massar2010) (2010). "Judging a Book by Its Cover: Jealousy after Subliminal Priming with Attractive and Unattractive Faces". In: _Personality and individual differences_ 49.6, pp. 634-638. ISSN: 0191-8869. DOI: [10.1016/j.paid.2010.05.037](https://doi.org/10.1016%2Fj.paid.2010.05.037).