scalable oversight, amplified oversight, debate
2024 – Present
faithfulness, explainability, natural language explanations, chain of thought, reasoning
2023 – Present
ai safety, ai alignment, catastrophic risk, existential risk
2022 – Present
process-based supervision, large language models
2022 – 2023