Cross-Modal Recipe Embeddings by Disentangling Recipe Contents and Dish Styles

Keiji Yanai

19 May 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: Nowadays, cooking recipe sharing sites on the Web are widely used, and play a major role in everyday home cooking. Since cooking recipes consist of dish photos and recipe texts, cross-modal recipe search is being actively explored. To enable cross-modal search, both food image features and cooking text recipe features are embedded into the same shared space in general. However, in most of the existing studies, a one-to-one correspondence between a recipe text and a dish image in the embedding space is assumed, although an unlimited number of photos with different serving styles and different plates can be associated with the same recipe. In this paper, we propose a RDE-GAN (Recipe Disentangled Embedding GAN) which separates food image information into a recipe image feature and a non-recipe shape feature. In addition, we generate a food image by integrating both the recipe embedding and a shape feature. Since the proposed embedding is free from serving and plate styles which are unrelated to cooking recipes, the experimental results showed that it outperformed the existing methods on cross-modal recipe search. We also confirmed that only either shape or recipe elements can be changed at the time of food image generation.

0 Replies