Abstract: Cross-modal retrieval models have shown impressive performance on the image-to-recipe retrieval task, a common benchmark in the multimedia field. However, the task assumes that an exact recipe match for a query image exists in the target database—an assumption that rarely holds true in real-world scenarios. When excluding exact matches from the target domain, our analysis revealed that relying solely on visual and textual similarity between recipes is insufficient to achieve good retrieval results. Other similarities should also be considered. Since ingredient similarity aligns with human intuition and nutritional similarity is crucial for health-conscious applications, we propose a model that incorporates ingredient and nutritional relevance into the retrieval process. We measured the similarity of unpaired recipes using three new metrics: mean absolute scaled error (MASE) for assessing nutritional similarity and IOU and weighted IOU (WIOU) for measuring ingredient overlap. Our proposed method can also be applied with existing image-recipe retrieval models and improved top-1 MASE, IOU, and WIOU by up to 18.13%, 9.91%, and 6.88%.
External IDs:dblp:conf/icmcs/ParinayokSAY25
Loading