mechanistic interpretability, explainable AI
Present
deep learning, machine learning, large language models, transformers
2021 – Present
AI safety, ethics, accountability, fairness, transparency
2021 – Present
inverse reinforcement learning, imitation learning, reinforcement learning
2017 – 2022