VolE$$^{++}$$: A Text-Guided Point-Cloud Framework for Food 3D Reconstruction and Volume Estimation

Umair Haroon, Ahmad AlMughrabi, Ricardo Marques, Petia Radeva

Published: 01 Jan 2026, Last Modified: 11 May 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Accurate food volume estimation is crucial for health monitoring, medical nutrition management, and food intake applications. Current 3D food volume estimation methods are too generic, missing the context of the estimated objects, and thus their performance is suboptimal. We present VolE$^{++}$, a framework designed to achieve food objects’ 3D reconstruction and volume estimation. This approach enables users to specify a target food item through text input, allowing for precise segmentation of specific food objects in a real-world environment. Once segmented, the object is reconstructed using the VolE 3D reconstruction framework. This process uses Multi-View Stereo techniques to transform a point cloud into a refined mesh, ensuring high spatial fidelity for accurate 3D volume estimation. Extensive evaluations of the FoodKit and MetaFood3D datasets demonstrate the effectiveness of our method in isolating and reconstructing food items, with improvements across multiple datasets achieving a 0.2% MAPE, highlighting its superior performance in food volume estimation.

External IDs:doi:10.1007/978-3-032-04968-1_33