LLM-Generated Class Descriptions for Semantically Meaningful Image Classification

Published: 01 Jan 2024, Last Modified: 16 May 2025EXPLAINS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neural networks have become the primary approach for tackling computer vision tasks, but their lack of transparency and interpretability remains a challenge. Integrating neural networks with symbolic knowledge bases, which could provide valuable context for visual concepts, is not yet common in the machine learning community. In image classification, class labels are often treated as independent, orthogonal concepts, resulting in equal penalization of misclassifications regardless of the semantic similarity between the true and predicted labels. Previous studies have attempted to address this by using ontologies to establish relationships among classes, but such data structures are generally not available. In this paper, we use a large language model (LLM) to generate textual descriptions for each class label, aiming to capture the visual characteristics of the corresponding concepts. These descriptions are then encoded into embedding vectors, which are used as the ground truth for t
Loading