William Saunders

Researcher, Alignment Science, Anthropic

Researcher, Model Evaluation and Threat Research

  • Joined January 2023

Names

William Saunders

Emails

****@williamsaunders.net (Confirmed)

Career & Education History

Researcher
Alignment Science, Anthropic (anthropic.com)
2025Present
 
Researcher
Model Evaluation and Threat Research (metr.org)
20242025
 
Researcher
OpenAI (openai.com)
20212024
 

Advisors, Relations & Conflicts

Coworker
20212024
 
Coauthor
20172021
 
Coworker
20172021
 

Expertise

interpretability
20222024
 
language models
20202024
 
scalable oversight, debate, critiques
20202022
 

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview