Analyze, Generate, Improve: Failure-Based Data Generation for Large Multimodal Models

Published: 06 May 2025, Last Modified: 06 May 2025SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic data, model failures, visual instruction tuning
Abstract: Training models on synthetic data is an effective strategy for improving large multimodal models (LMMs) due to the scarcity of high-quality paired image-text data. Existing methods generate multimodal datasets but do not address specific reasoning deficiencies in LMMs. In contrast, humans learn efficiently by focusing on past failures. Inspired by this, we propose a synthetic data generation approach that analyzes an LMM’s reasoning failures using frontier models to generate and filter high-quality examples. Our method produces a 553k-example multimodal instruction tuning dataset, leading to improved LMM performance, even surpassing models trained on equivalent real data demonstrating the high value of generating synthetic data targeted to specific reasoning failure modes in LMMs.
Submission Number: 45
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview