EURECOM Participation @ CRAG-MM 2025 Challenge

Semere Wubshet Berhanu; Raphael Troncy

EURECOM Participation @ CRAG-MM 2025 Challenge

Semere Wubshet Berhanu, Raphael Troncy

Published: 20 Aug 2025, Last Modified: 01 Feb 20262025 KDD Cup CRAG-MM WorkshopEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: RAG, MM-RAG, VLM, LLM, question answering, image search, Web search, entity extraction, LettRAGraph

TL;DR: This paper described the approach and results obtained by the EURECOM D2KLab's team at CRAG-MM 2025 organized by Meta

Abstract: This technical report details D2KLab at EURECOM's approach for the Comprehensive Retrieval Augmented Generated Multi-Modal Challenge (CRAG-MM) 2025 organized by Meta at KDD 2025. Our solution relies on a modular pipeline that integrates a Vision Language Model (VLM) and makes use of both image and web search APIs. Our solution tackles the three proposed sub-tasks mixing pipeline components that perform domain classification, entity extraction, image segmentation for refined image search, and web content re-ranking. Overall, our approach ranked 39th on the Truthfulness metric with a score of -0.081 for the multi-turn and multi-source Task 3, 43rd with a score of -0.176 for Task 2, and 52nd with a score of -0.205 for Task 1. We use less than half of the allocated 10 second budget per query. We release the source code of our approach for supporting reproducibility at https://gitlab.aicrowd.com/semere_wubshet/d2klab-meta-crag-mm-2025.

Submission Number: 13

Loading