Learning to Taste: A Multimodal Wine Dataset

Thoranna Bender; Simon Moe Sørensen; Alireza Kashani; Kristjan Eldjarn Hjorleifsson; Grethe Hyldig; Søren Hauberg; Serge Belongie; Frederik Rahbæk Warburg

Learning to Taste: A Multimodal Wine Dataset

Thoranna Bender, Simon Moe Sørensen, Alireza Kashani, Kristjan Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Rahbæk Warburg

Published: 26 Sept 2023, Last Modified: 15 Jan 2024NeurIPS 2023 Datasets and Benchmarks PosterEveryoneRevisionsBibTeX

Keywords: Crowd annotations, Multi-modal, Concept embeddings

Abstract: We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and representing human perception of flavor.

Supplementary Material: pdf

Submission Number: 754

Loading