Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small | OpenReview

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small

Open Webpage

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

Published: 2023, Last Modified: 30 Sept 2024ICLR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Loading