Cosine Similarity is Almost All You Need (for Prototypical-Part Models)

Luke Moffett, Frank Willard, Maximillian Machado, Emmanuel Mokel, Jon Donnelly, Zhicheng Guo, Adam Costarino, Julia Yang, Giyoung Kim, Alina Jade Barnett, Cynthia Rudin

Published: 01 Jan 2026, Last Modified: 07 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Prototypical-part networks are a popular interpretable alternative to black-box deep learning models for computer vision because of their faithful, prototype-based self-explanations.However, in practice, they have proven difficult to train because they are highly sensitive to hyperparameter tuning and difficult to comprehend because they contain a large number of prototypes.We show that replacing l_2 distance with an angular prototype similarity in the original ProtoPNet greatly improves robustness to hyperparameter selection and is sufficient to produce accuracy and sparsity competitive with state-of-the-art on many backbones and datasets.We also show cosine similarity leads to superior accuracy for five different ProtoPNet architectures (ProtoPNet, TesNet, Deformable ProtoPNet, ProtoTree, and ST-ProtoPNet).Finally, we demonstrate ProtoPNet with cosine similarity produces better semantics than l_2: prototypes from cosine models score better on prototype quality metrics and are perceived as more similar 3:2 in a user study.