Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness | OpenReview

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Open Webpage

Qi Zhang, Yifei Wang, Jingyi Cui, Xiang Pan, Qi Lei, Stefanie Jegelka, Yisen Wang

Published: 2025, Last Modified: 16 May 2025ICLR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Loading