Scalable oversight
2023 – Present
Foundation model based agents
2022 – Present
Alignment, collective alignment
2022 – Present
Evaluations and benchmarking foundation models
2022 – Present
Jailbreak, safety and responsible generative AI
2022 – Present
Multi-agent learning, multi-agent RL
2014 – Present