Counterfactual Explanations for Visual Recommender Systems

Vibhhu Sharma, Neham Jain, Gaurav Sinha

Published: 13 May 2024, Last Modified: 30 Sept 2024WWW '24: Companion Proceedings of the ACM Web Conference 2024EveryoneCC BY 4.0

Abstract: Users rely on clever recommendations for items they might like to buy, and service providers rely on clever recommender systems to ensure that their product is recommended to their target audience. Providing explanations for recommendations helps to increase transparency and the users' overall trust in the system, besides helping practitioners debug their recommendation model. Modern recommendation systems utilize multi-modal data such as reviews and images to provide recommendation. In this work, we propose CAVIAR (Counterfactual explanations for VIsual Recommender systems), a novel method to explain recommender systems that utilize visual features of items. Our explanation is counterfactual and is optimized to be simultaneously simple and effective. Given an item in the user's top-K recommended list, CAVIAR makes a minimal, yet meaningful, perturbation to the item's image-embedding such that it is no longer a part of the list. In this way, CAVIAR aims to find the visual features of the item that were the most relevant for the recommendation. In order to lend meaning to the perturbations, we leverage CLIP model to connect the perturbed image features to textual features. We frame the explanation as a natural language counterfactual by contrasting the observed visual features in the item before and after the perturbation.