William Saunders
William Saunders
Researcher, Alignment Science, Anthropic
Joined
January 2023
Names
Emails
****@williamsaunders.net (Confirmed)
****@anthropic.com (Confirmed)
Personal Links
Career & Education History
Researcher
Alignment Science, Anthropic (anthropic.com)
2025 – Present
Researcher
Model Evaluation and Threat Research (metr.org)
2024 – 2025
Researcher
OpenAI (openai.com)
2021 – 2024
Advisors, Relations & Conflicts
Expertise
interpretability
2022 – Present
language models
2020 – 2024
scalable oversight
, debate
, critiques
2020 – 2022
Publications
Co-Authors
- Aarohi Srivastava
- Abhinav Rastogi
- Abhishek Rao
- Abu Awal Md Shoeb
- Abubakar Abid
- Adam Bales
- Adam Fisch
- Adam R. Brown
- Adam Santoro
- Aditya Gupta
- Adrià Garriga-Alonso
- Agnieszka Kluska
- Aiman Soliman
- Aitor Lewkowycz
- Akshat Agarwal
- Alec Radford
- Alejandro Ortega
- Alethea Power
- Alex Nichol
- Alex Paino
- Alex Ray
- Alex Warstadt
- Alexander W. Kocurek
- Ali Safaya
- Ali Tazarv