OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Paul Röttger
Lecturer, University of Oxford
Joined
January 2022
Names
Paul Röttger
(Preferred)
,
Paul Rottger
Emails
****@oii.ox.ac.uk
(Confirmed)
,
****@unibocconi.it
(Confirmed)
Personal Links
Homepage
Google Scholar
DBLP
ORCID
LinkedIn
Semantic Scholar
ACL Anthology
Career & Education History
Lecturer
University of Oxford
(oxford.ac.uk)
2025
–
Present
Postdoc
Bocconi University
(unibocconi.it)
2023
–
2025
PhD student
University of Oxford
(ox.ac.uk)
2019
–
2023
Advisors, Relations & Conflicts
PhD Advisor
Janet Pierrehumbert
Present
PhD Advisor
Helen Margetts
Present
Postdoc Advisor
Dirk Hovy
Present
Expertise
AI Safety
Present
Social NLP
Present
Hate Speech Detection
Present
Computational Social Science
Present
Publications
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
,
Joachim Baumann
,
Lorenzo Lupo
,
Nigel Collier
,
Dirk Hovy
,
Paul Röttger
ICLR 2026 Poster
Readers:
Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
,
Joachim Baumann
,
Lorenzo Lupo
,
Nigel Collier
,
Dirk Hovy
,
Paul Röttger
Social Sim'25
Readers:
Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
,
Joachim Baumann
,
Lorenzo Lupo
,
Nigel Collier
,
Dirk Hovy
,
Paul Röttger
NLPOR 2025
Readers:
Everyone
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Tiancheng Hu
,
Joachim Baumann
,
Lorenzo Lupo
,
Nigel Collier
,
Dirk Hovy
,
Paul Röttger
ACL-SRW 2025 Oral
Readers:
Everyone
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
,
Hannah Rose Kirk
,
Bertie Vidgen
,
Giuseppe Attanasio
,
Federico Bianchi
,
Dirk Hovy
NAACL 2024 Main
Readers:
Everyone
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Manuel Tonneau
,
Diyi Liu
,
Samuel Fraiberger
,
Ralph Schroeder
,
Scott A. Hale
,
Paul Röttger
WOAH @ NAACL 2024
Readers:
Everyone
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
,
Chengzhi Hu
,
Paul Röttger
,
Barbara Plank
ICLR 2025 Poster
Readers:
Everyone
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk
,
Alexander Whitefield
,
Paul Röttger
,
Andrew Michael Bean
,
Katerina Margatina
,
Rafael Mosquera
,
Juan Manuel Ciro
,
Max Bartolo
,
Adina Williams
,
He He
,
Bertie Vidgen
,
Scott A. Hale
NeurIPS 2024 Track Datasets and Benchmarks Oral
Readers:
Everyone
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang
,
Chengzhi Hu
,
Bolei Ma
,
Paul Rottger
,
Barbara Plank
COLM
Readers:
Everyone
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
,
Aleksandar Petrov
,
Bertie Vidgen
,
Christian Schroeder de Witt
,
Fabio Pizzati
,
Katherine Elkins
,
Supratik Mukhopadhyay
,
Adel Bibi
,
Botos Csaba
,
Fabro Steibel
,
Fazl Barez
,
Genevieve Smith
,
Gianluca Guadagni
,
Jon Chun
,
Jordi Cabot
,
Joseph Marvin Imperial
,
Juan A. Nolazco-Flores
,
Lori Landay
,
Matthew Thomas Jackson
,
Paul Rottger
et al. (4 additional authors not shown)
ICML 2024 Oral
Readers:
Everyone
View all 22 publications
Co-Authors
Adel Bibi
Adina Williams
Aleksandar Petrov
Alexander Whitefield
Andrew Michael Bean
Barbara Plank
Bertie Vidgen
Bolei Ma
Botos Csaba
Chengzhi Hu
Christian Schroeder de Witt
Dan Jurafsky
Debora Nozza
Dirk Hovy
Diyi Liu
Dong Nguyen
Fabio Pizzati
Fabro Steibel
Fazl Barez
Federico Bianchi
Francisco Eiras
Genevieve Smith
Gianluca Guadagni
Giuseppe Attanasio
Haitham Seelawi
View all 63 co-authors