Answerability Fields: Answerable Location Estimation via Diffusion Models

Published: 01 Jan 2024, Last Modified: 05 Mar 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose Answerability Fields (AnsFields), a novel approach for predicting the answerability of questions at different locations within indoor environments. AnsFields is represented as a map, where each grid’s score reflects how well a question can be answered using the panoramic image at that location. Using a 3D question-answering dataset, we construct comprehensive AnsFields covering diverse scenes from ScanNet. Additionally, we employ a diffusion model to infer AnsFields from a scene’s top-down view image and the question. We then conduct 3D question-answering using these predicted AnsFields and achieve a 24% improvement in accuracy over the standard 3D-QA method. Our results demonstrate the importance of object locations for answering questions in the environment, highlighting the potential of AnsFields for applications in robotics, augmented reality, and human-robot interaction.
Loading