Feynman: Knowledge-Infused Diagramming Agent for Scaling Visual Reasoning Data

Zixin Wen; Yifu Cai; Kyle Lee; Sam Estep; Joshua Sunshine; Aarti Singh; Yuejie Chi; Wode Ni

Feynman: Knowledge-Infused Diagramming Agent for Scaling Visual Reasoning Data

Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Joshua Sunshine, Aarti Singh, Yuejie Chi, Wode Ni

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Dataset, Synthetic Data, Visual Reasoning Benchmark

TL;DR: We create a scalable vision-language data generation pipeline that produces knowledge-infused images.

Abstract: Visual reasoning is an essential ability of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet image and text data, knowledge-rich and well-aligned image-text pairs are rare. In this paper, we present a scalable data generation pipeline built with our diagramming agent, **Feynman**. To create diagrams, Feynman first enumerates domain-specific knowledge components ("ideas") and performs code planning based on the ideas. Given the plan, Feynman translates ideas into simple declarative programs and iterates to receives feedback and visually refine diagrams. Finally, the declarative programs are rendered by the Penrose diagramming system. The optimization-based rendering of Penrose preserves the visual semantics while injecting fresh randomness into the layout, thereby producing diagrams with visual consistency and diversity. As a result, Feynman can author diagrams along with grounded captions with very little cost and time. Using Feynman, we synthesized a dataset with more than 100$k$ well-aligned diagram-caption pairs. We also curate a visual-language benchmark, **Diagramma**, from freshly generated data.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13096

Loading