Keywords: Embedding Interpretability Score, Interpretable Models, Black-Box Models, Trade-Off Analysis, Accuracy vs. Interpretability, Dimensionality, Sparsity, Clusterability
Abstract: Interpretability has long influenced the selection of machine learning models, yet the role of data representations remains relatively underexplored. Although model choice is known to influence performance, the interpretability of embedding models can also be equally critical. In this study, we present a comparative analysis of various black-box and interpretable embedding models within multiple domains, including natural language processing and computer vision. We introduce a domain-agnostic quantitative score called Embedding Interpretability Score (EIS) to measure the interpretability of embedding models based on three fundamental properties: dimensionality, which reflects representational compactness; sparsity, which highlights feature selectivity; and clusterability, which measures semantic organization. Our results indicate that, in general, the choice of the embedding technique exerts a significant influence on downstream performance in comparison to classifier selection. Interestingly, the relationship between interpretability and performance differs across modalities: in NLP tasks, higher-performing embeddings tend to have lower interpretability, whereas in CV tasks, embeddings with higher interpretability often achieve better downstream performance.
Primary Area: interpretability and explainable AI
Submission Number: 20318
Loading