Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small
Kevin Ro Wang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt
Published: 01 Jan 2023, Last Modified: 30 Sept 2024
ICLR 2023
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading