[Tiny Paper] Integrating Simulation and Chain-of-thought Reasoning in Multimodal-Language Models For Physical Reasoning
Keywords: Intuitive Physics, Physical Reasoning, Multimodal-Language Models, Diffusion Model-based Simulator
Abstract: In this work, we present a cognitively-inspired model that tackles the question of resource rationality we encounter in physical reasoning, which is a question that is consistently overlooked by the current AI community. Given the fact that various tools like Chain-of-Thought and Physics-Grounded Simulator could be applied to solve physical reasoning tasks, an observation naturally emerges: when to use which? what is the criterion for choosing the best tools at various scenarios? To tackle this question, our model aim to optimize the scheme in when to use which tools in order to reach the best tradeoff between computational costs and accuracy. Our model is able to (1) improve overall accuracy while significantly reducing the computational costs by nearly 50\%. (2) found that two methods performs better in physical reasoning tasks from different categories, without being trained on categories labels.
Submission Number: 71
Loading