Hybrid Concept-based Models: Using Concepts to Improve Neural Networks’ Accuracy

Published: 05 Nov 2025, Last Modified: 27 Nov 2025NLDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep learning, concepts, machine learning, concept-based models, interpretability, dataset, adversarial attacks
TL;DR: We propose new neural network architectures using concepts during training, a new syntetic dataset with concepts and an algorithm for producing adversarial examples for interpretable concept-based models.
Abstract: Most datasets used for supervised machine learning consist of a single label per data point. However, in cases where more information than just the class label is available, would it be possible to train models more efficiently? We introduce two novel model architectures, which we call \emph{hybrid concept-based models}, that train using both class labels and additional information in the dataset referred to as \emph{concepts}. In order to thoroughly assess their performance, we introduce \emph{ConceptShapes}, an open and flexible class of datasets with concept labels. We show that the hybrid concept-based models can outperform standard computer vision models and previously proposed concept-based models with respect to accuracy. We also introduce an algorithm for performing \emph{adversarial concept attacks}, where an image is perturbed in a way that does not change a concept-based model's concept predictions, but changes the class prediction. The existence of such adversarial examples raises questions about the interpretable qualities promised by concept-based models.
Git: https://github.com/Tobias-Opsahl/Hybrid-Concept-based-Models
Serve As Reviewer: ~Tobias_Aanderaa_Opsahl1
Project: https://github.com/Tobias-Opsahl/ConceptShapes
Submission Number: 12
Loading