AutoRaC: An Automatic Retrieval Data Construction Method Based on Multimodal Large Language Model Preferences

ACL ARR 2026 January Submission2981 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal RAG, Retrieval Data Construction, MLLM Preference
Abstract: Multimodal retrievers play a pivotal role in multimodal retrieval-augmented generation, their performance directly determines the quality of acquired external knowledge. Since the retriever's effectiveness is highly dependent on the accuracy and coverage of its training data, the quality and diversity of retrieval training data become critically important. However, existing multimodal retrieval training data construction approaches primarily rely on imprecise pseudo-relevance and single-document paradigms within isolated knowledge base, resulting in inaccurate relevance annotations, limited expansion of external knowledge bases, and failure to simultaneously guarantee accuracy and diversity in data construction. To address these challenges, we propose An Automatic Retrieval Data Construction Method Based on Multimodal Large Language Model Preferences (AutoRaC), which implements MLLM-preference-guided construction through a two-stage filtering pipeline, automatically generating high-fidelity retrieval data while enabling knowledge base expansion, thereby enhancing data diversity. Results on InfoSeek and EVQA demonstrate that our method achieves accurate relevance annotations while also enabling knowledge base expansion, with the constructed data matching the quality of existing high-quality datasets.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Multimodal RAG, Retrieval Data Construction, MLLM Preference
Contribution Types: Data resources
Languages Studied: English
Submission Number: 2981
Loading