Finding Better Prototypes For Interpretable Text Classifiers With LLM Optimization

Finding Better Prototypes For Interpretable Text Classifiers With LLM Optimization

ICLR 2026 Conference Submission21456 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretable ML, Prototypes, Large Language Models, Optimization, Text Classifier

TL;DR: We show that prototypes in interpretable text classifiers can be made more intelligible and accurate by using LLMs as optimization tools.

Abstract: Prototype neural networks are the most popular form of interpretable-by-design classifiers in machine learning. Within this field, prototypes are typically learned as black-box vectors, and then projected onto the nearest example from the training data for visualization and inference purposes. This improves interpretability because we can understand the logic behind predictions based on the similarity between the input instance and the nearest prototype in the network. However, because these prototypes are real training instances there are at least two major issues with this approach. Firstly, as the projected prototypes do not represent the learned ``black-box'' vectors which were optimized for accuracy, there is typically a performance drop off. Secondly, because the prototypes are real training instances, they are usually overly specific and full of spurious or irrelevant details, making them difficult to interpret readily. In this study, we address this problem by using large-language models (LLMs) as a tool for optimization to find better prototypes for the network. Across a series of experiments, we find that our method produces prototypes which sacrifice less performance and are more intelligible compared to baselines which project. Previously, it was not possible to visualize a learned prototype, because methods were constrained to projection using actual training data, but our approach suggests a possible path to overcome this limitation.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 21456

Loading