Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

Yunqing Liu; Nan Zhang; Zhiming Tan

Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

Yunqing Liu, Nan Zhang, Zhiming Tan

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision–Language Models, Error Notebook, Specification-Aware Part Retrieval, CoT Reasoning, Human-Preference Dataset

TL;DR: Training-free framework that uses Error Notebooks + RAG to guide VLMs for specification-aware part retrieval in CAD assemblies, yielding significant accuracy gains (e.g., +23.4% on human-preference data).

Abstract: Effective specification-aware part retrieval within complex CAD assemblies is essential for automated engineering tasks. However, using LLMs/VLMs for this task is challenging: the metadata sequences often exceed token budgets, and fine-tuning high-performing proprietary models (e.g., GPT, Gemini) is unavailable. Therefore, we need a framework that delivers engineering value by handling long, non-natural-language metadata associated with real 3D assemblies. We propose an inference-time adaptation framework that combines corrected Error Notebooks with RAG to substantially improve VLM-based part retrieval. Each Error Notebook is built by correcting initial CoTs through reflective refinement, and then filtering each trajectory using a grammar-constraint (GC) verifier to ensure structural well-formedness. The resulting notebook forms a high-quality repository of specification-CoT-answer triplets, from which RAG retrieves specification-relevant exemplars to condition the model’s inference. We additionally contribute a CAD dataset with preference annotations. Experiments with proprietary models (GPT-4o, Gemini, etc) show large gains, with GPT-4o (Omni) achieving up to +23.4 absolute accuracy points on the human-preference benchmark. The proposed GC verifier can further produce +4.5 accuracy points. Our approach also surpasses other training-free baselines (standard few-shot learning, self-consistency) and yields substantial improvements for open-source VLMs (Qwen2-VL-2B-Instruct, Aya-Vision-8B). Under the cross-model GC setting, where the Error Notebook is constructed using GPT-4o (Omni), the 2B model inference achieves performance that comes within roughly 4 points of GPT-4o mini.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 737

Loading