Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun; Yijiang Li; Qingying Gao; Haiyun Lyu; Dezhi Luo; Hokin Deng

Probing Mechanical Reasoning in Large Vision Language Models

Haoran Sun, Yijiang Li, Qingying Gao, Haiyun Lyu, Dezhi Luo, Hokin Deng

Published: 06 Mar 2025, Last Modified: 05 May 2025ICLR 2025 Bi-Align Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mechanical reasoning; vision language models; model-based reasoning; intuitive physics; cognitive AI

TL;DR: In a large-scale evaluation of 26 Vision Language Models across 155 cognitive experiments on system stability, gears and pulleys, leverage, inertia, and fluid mechanics, these models consistently underperformed humans.

Abstract: Mechanical reasoning is a hallmark of human intelligence, defined by its ubiquitous yet irreplaceable role in human activities ranging from routine tasks to civil engineering. Embedding machines with mechanical reasoning is therefore an important step towards building human-level artificial intelligence. Here, we leveraged 155 cognitive experiments to test the understanding of system stability, gears and pulley systems, leverage principle, inertia and motion, and fluid mechanics in 26 Vision Language Models (VLMs). Results indicate that VLMs consistently perform worse than humans on all domains, while demonstrate significant difficulty in reasoning about gear systems and fluid mechanics. Notably, their performance on these tasks do not improve as number of parameters increase, suggesting that current attention-based architecture may fail to grasp certain underlying mechanisms required for mechanical reasoning, particularly those pertaining to mental simulations.

Submission Type: Long Paper (9 Pages)

Archival Option: This is an archival submission

Presentation Venue Preference: ICLR 2025

Submission Number: 74

Loading