SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
Abstract: Highlights•In the ROOMELSA challenge, we formulate 3D object retrieval as a mask-conditioned, language-driven grounding task in 3D environments, unifying spatial reference and natural language understanding within a unified task.•We introduce ROOMELSA, a novel benchmark dataset comprising 1,622 scenes, 5,197 rooms, and 44,445 (mask, text) query pairs, with a dedicated unseen-category test split designed to evaluate cross-domain generalization.•We analyze the top-five challenge entries, revealing how scene-level inference, depth-aware reconstruction, and adaptive cross-modal fusion affect Mean Reciprocal Rank.
Loading