Neural Additive Adapters for Interpretable Nutrition Prediction

Vitalii Emelianov, Niki Martinel

Published: 27 Oct 2025, Last Modified: 07 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: We study how large vision models (LVMs) can predict food nutrition through lightweight and interpretable adapters---the machine learning modules the predictions of which could be understood by humans. We introduce novel nutrition adapters that use features extracted by pre-trained LVMs and output the so-called nutrition maps. Nutrition maps indicate the concentration of nutrition values per each image location. We use such an interpretable representation to obtain the nutrition targets as a sum of all nutrition concentrations on the maps. To understand our approach's generalization capability, we systematically analyze the behavior of our novel interpretable adapters leveraging different LVMs with different food image-nutrition datasets. Our lightweight approach delivers better or on-par performance than the state-of-the-art models on the Nutrition5k and the Nutritionverse-Real benchmarks. The code is provided at https://github.com/vitaly-emelianov/nutrition-adapters.
Loading