SmellNet: A Large-scale Dataset for Real-world Smell Recognition

ICLR 2026 Conference Submission4927 Authors

14 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Smell sensing, multimodal AI, AI for smell, smell recognition, chemistry, physical sensing
TL;DR: We introduce SMELLNET, a large-scale real-world odor dataset captured with portable sensors, and SCENTFORMER, a Transformer base model that performs strongly on classification and mixture prediction tasks.
Abstract: The ability of AI to sense and identify various substances based on their smell alone can have profound impacts on allergen detection (e.g., smelling gluten or peanuts in a cake), monitoring the manufacturing process, and sensing hormones that indicate emotional states, stress levels, and diseases. Despite these broad impacts, there are virtually no large-scale benchmarks, and therefore little progress, for training and evaluating AI systems’ ability to smell in the real world. In this paper, we use portable gas and chemical sensors to create SMELLNET, the first large-scale database that digitizes a diverse range of smells in the natural world. SMELLNET contains about 828,000 data points across 50 substances, spanning nuts, spices, herbs, fruits, and vegetables, and 43 mixtures among them, with 68 hours of data collected. Using SMELLNET, we developed SCENTFORMER, a Transformer-based architecture combining temporal differencing and sliding-window augmentation for smell data. For the SMELLNET-BASE classification task, SCENTFORMER achieves 58.5% Top-1 accuracy, and for the SMELLNET-MIXTURE distribution prediction task, SCENTFORMER achieves 50.2% Top-1@0.1 on the test-seen split. SCENTFORMER’s ability to generalize across conditions and capture transient chemical dynamics demonstrates the promise of temporal modeling in olfactory AI. SMELLNET and SCENTFORMER lay the groundwork for real-world olfactory applications across healthcare, food and beverage, environmental monitoring, manufacturing, and entertainment.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 4927
Loading