PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?

Atharva Gundawar; Som Sagar; Ransalu Senanayake

PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?

Atharva Gundawar, Som Sagar, Ransalu Senanayake

Published: 18 Sept 2025, Last Modified: 16 Jan 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: VLMs, VLAs, Properties, Affordances, Constraints

TL;DR: PAC is a benchmark designed to test whether pre‑trained vision‑language models truly understand the object properties, affordances, and real‑world constraints required for executable robot manipulation.

Abstract: Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that is largely unverified. To perform actions reliably, robots must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspable, stackable), and physical constraints (e.g., stability, reachability, or an object's state like being closed). Despite their ubiquitous use in manipulation, we argue that off-the-shelf VLMs may lack this granular, physically-grounded understanding, as these specific prerequisites are often overlooked during training. Addressing this critical gap, we introduce PAC Bench, a comprehensive benchmark designed to systematically evaluate VLMs on their understanding of these core Properties, Affordances, and Constraints (PAC) from a task executability perspective. PAC Bench features a diverse dataset with more than 30,000 annotations, comprising 673 real-world images (115 object classes, 15 property types, 1–3 affordances defined per object class), 100 real-world humanoid view scenarios, and 120 unique simulated constraint scenarios across four tasks. Our evaluations reveal significant gaps in the ability of VLMs to grasp fundamental physical concepts, underscoring their current limitations for reliable robot manipulation and pointing to key areas that require targeted research. PAC Bench also serves as a standardized benchmark for rigorously evaluating the physical reasoning capabilities of VLMs guiding the development of more robust and physically grounded models for robot manipulation.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/lens-lab/pacbench

Code URL: https://github.com/Atharva-Gundawar/pacbench

Supplementary Material: pdf

Primary Area: Evaluation (e.g., data collection methodology, data processing methodology, data analysis methodology, meta studies on data sources, extracting signals from data, replicability of data collection and data analysis and validity of metrics, validity of data collection experiments, human-in-the-loop for data collection, human-in-the-loop for data evaluation)

Submission Number: 2220

Loading