TasteBench: multimodal benchmark for sensory prediction, from molecules to sustainable foods

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: sustainable proteins, sensory prediction, food science, multimodal learning, benchmark, foundation models, alternative proteins
TL;DR: TasteBench: a multimodal benchmark for sensory prediction of sustainable proteins. 21K+ human ratings across 215 plant-based foods; a 15K-molecule taste prediction task. Best model matches the median individual panelist.
Abstract: Sustainable protein discovery lacks the fast computational proxies, analogous to molecular docking or density functional theory, that accelerate drug and materials discovery. Evaluating whether a novel food tastes like its animal-based target requires expensive human sensory panels, bottlenecking the design-build-test loop. We introduce TasteBench, a multimodal benchmark and privacy-preserving competition for sensory prediction, spanning two tasks: a food-level ranking task built on 21K+ human evaluations across 215 plant-based foods in 24 product categories, yielding 935 within-category ranking pairs, and a supporting molecular-level taste classification task over 15K flavor molecules. To enable rigorous interpretation of model performance, we characterize the ground truth: inter-rater agreement among panelists is low (Krippendorff's $\alpha$ = .077), and the split-half reliability ceiling of panel-aggregated rankings is .825, establishing the range within which ML systems on this benchmark should be assessed. We evaluate baselines across four input modalities; on the same pairs panelists rated, the best model achieves .661 pairwise accuracy, competitive with the median individual panelist (.650). TasteBench provides the evaluation infrastructure and baselines for measuring progress on computational screening for sustainable protein discovery.
Submission Number: 110
Loading