Paul Röttger

Lecturer, University of Oxford

Joined

January 2022

Names

Paul Röttger (Preferred)

Paul Rottger

Emails

****@oii.ox.ac.uk (Confirmed)

****@unibocconi.it (Confirmed)

Personal Links

Career & Education History

Lecturer

University of Oxford (oxford.ac.uk)

2025 – Present

Postdoc

Bocconi University (unibocconi.it)

2023 – 2025

PhD student

University of Oxford (ox.ac.uk)

2019 – 2023

Advisors, Relations & Conflicts

PhD Advisor

Janet Pierrehumbert

Present

PhD Advisor

Helen Margetts

Present

Postdoc Advisor

Dirk Hovy

Present

Expertise

AI Safety

Present

Social NLP

Present

Hate Speech Detection

Present

Computational Social Science

Present

Publications

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
- ICLR 2026 Poster
- Readers: Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
- Social Sim'25
- Readers: Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
- NLPOR 2025
- Readers: Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
- ACL-SRW 2025 Oral
- Readers: Everyone
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy
- NAACL 2024 Main
- Readers: Everyone
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Manuel Tonneau, Diyi Liu, Samuel Fraiberger, Ralph Schroeder, Scott A. Hale, Paul Röttger
- WOAH @ NAACL 2024
- Readers: Everyone
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank
- ICLR 2025 Poster
- Readers: Everyone
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Michael Bean, Katerina Margatina, Rafael Mosquera, Juan Manuel Ciro, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale
- NeurIPS 2024 Track Datasets and Benchmarks Oral
- Readers: Everyone
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Rottger, Barbara Plank
- COLM
- Readers: Everyone
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Thomas Jackson, Paul Rottger et al. (4 additional authors not shown)
- ICML 2024 Oral
- Readers: Everyone

View all 22 publications

Co-Authors

View all 63 co-authors