Noah Y. Siegel

PhD student, AI Center, University College London, University of London

Researcher, Google DeepMind

Joined

September 2019

Names

Noah Y. Siegel (Preferred)

Noah Yamamoto Siegel

Noah Siegel

Emails

****@google.com (Confirmed)

****@deepmind.com (Confirmed)

****@uw.edu (Confirmed)

****@ucl.ac.uk (Confirmed)

****@gmail.com (Confirmed)

Personal Links

Career & Education History

PhD student

AI Center, University College London, University of London (ucl.ac.uk)

2023 – Present

Researcher

Google DeepMind (deepmind.com)

2017 – Present

Researcher

Allen Institute for Artificial Intelligence (allenai.org)

2015 – 2017

Undergrad student

University of Washington, Seattle (uw.edu)

2009 – 2015

Advisors, Relations & Conflicts

PhD Advisor

Maria Perez-Ortiz

2023 – Present

PhD Advisor

Oana Maria-Camburu

2023 – Present

Coworker

Nicolas Heess

2019 – Present

Expertise

scalable oversight

amplified oversight

debate

2024 – Present

faithfulness

explainability

natural language explanations

chain of thought

reasoning

2023 – Present

ai safety

ai alignment

catastrophic risk

existential risk

2022 – Present

process-based supervision

large language models

2022 – 2023

Publications

Training Large Language Models for Self-Explanation Faithfulness
Yeoktatt Cheah, Maria Perez-Ortiz, Noah Y. Siegel, Oana-Maria Camburu
- ICLR 2026 Re-Align Workshop
- Readers: Everyone
A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior
Harry Mayne, Justin Singh Kang, Dewi Sid William Gould, Kannan Ramchandran, Adam Mahdi, Noah Y. Siegel
- ICLR 2026 Trustworthy AI
- Readers: Everyone
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
Noah Y. Siegel, Nicolas Heess, Maria Perez-Ortiz, Oana-Maria Camburu
- ICLR 2026 Trustworthy AI
- Readers: Everyone
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
Noah Y. Siegel, Nicolas Heess, Maria Perez-Ortiz, Oana-Maria Camburu
- Submitted to ICLR 2026
- Readers: Everyone
LLMs Can Covertly Sandbag On Capability Evaluations Against Chain-of-Thought Monitoring
Chloe Li, Mary Phuong, Noah Y. Siegel
- ICML 2025 Workshop TAIG Oral
- Readers: Everyone
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton, Noah Yamamoto Siegel, Janos Kramar, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah Goodman, Rohin Shah
- NeurIPS 2024 poster
- Readers: Everyone
The Effect of Model Size on LLM Post-hoc Explainability via LIME
Henning Heyen, Amy Widdicombe, Noah Yamamoto Siegel, Philip Colin Treleaven, Maria Perez-Ortiz
- SeT LLM @ ICLR 2024
- Readers: Everyone
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
Noah Y. Siegel, Oana-Maria Camburu, Nicolas Heess, María Pérez-Ortiz
- ACL (Short Papers) 2024
- Readers: Everyone
Learning agile soccer skills for a bipedal robot with deep reinforcement learning
Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game et al. (8 additional authors not shown)
- Sci. Robotics 2024
- Readers: Everyone

View all 34 publications

Co-Authors

View all 97 co-authors