Abstract: Elderly populations often face significant challenges when it comes to dietary intake tracking, often exacerbated by health complications. Unfortunately, conventional diet assessment techniques such as food frequency questionnaires, food diaries, and 24 h recall are subject to substantial bias. Recent advancements in machine learning and computer vision show promise of automated nutrition tracking methods of food, but require a large, high-quality dataset in order to accurately identify the nutrients from the food on the plate. However, manual creation of large-scale datasets with such diversity is time-consuming and hard to scale. On the other hand, synthesized 3D food models enable view augmentation to generate countless photorealistic 2D renderings from any viewpoint, reducing imbalance across camera angles. In this paper, we present a process to collect a large image dataset of food scenes that span diverse viewpoints and highlight its usage in dietary intake estimation. We first collect quality 3D objects of food items (NV-3D) that are used to generate photorealistic synthetic 2D food images (NV-Synth) and then manually collect a validation 2D food image dataset (NV-Real). We benchmark various intake estimation approaches on these datasets and present NutritionVerse3D2D, a collection of datasets that contain 3D objects and 2D images, along with models that estimate intake from the 2D food images. We release all the datasets along with the developed models to accelerate machine learning research on dietary sensing.
External IDs:dblp:journals/data/TaiKNCWMPXW25
Loading