Multi-Timestep-Ahead Prediction with Mixture of Experts for Embodied Question Answering

Kanata Suzuki, Yuya Kamiwano, Naoya Chiba, Hiroki Mori, Tetsuya Ogata

Published: 01 Jan 2023, Last Modified: 24 May 2025ICANN (6) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this study, we propose a method that integrates visual field predictions with different time scales and investigates its effectiveness for embodied question answering (EQA). In EQA, it is desirable to be able to automatically select a prediction time scale according to the situation, as the path to the target object depends on the instructions provided. However, previous studies have only investigated subtask learning with a limited prediction timescale and target. We propose a mixed expert model in which multiple expert networks predict future images at different time steps, and a higher-level gating network estimates the distribution of each expert’s output. By sequentially adjusting the output of the expert network, the proposed method enables robot navigation considering multi-timestep-ahead prediction. Comparison experiments on the EQA MP3D dataset show that the proposed method improves the prediction accuracy of the model regardless of the distance to the target.