Scent of Health (S-O-H): Olfactory Multivariate Time-Series Dataset for Non-Invasive Disease Screening

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: enose, dataset, medicine, olfactory
TL;DR: A multivariate dataset from an Enose sensor for non-invasive disease screening with data from over 1000 unique patients.
Abstract: Exhaled breath analysis offers a promising, non-invasive alternative to traditional medical diagnostics. Electronic nose (eNose) sensors enable low-cost screening but progress is limited by small, site-specific datasets and sensor-specific temporal artifacts like baseline drift. We introduce Scent of Health (S-O-H), a large clinical eNose dataset with 1,027 patients across eight diagnostic groups, and reframe breath diagnosis as a realistic multivariate time series task. Our contribution includes curated temporal splits that control for sensor drift and mimic real-world deployment. We provide a reproducible benchmark with classical feature-based models, convolutional neural networks, and specialized time series classifiers. Our results demonstrate the dataset's utility, with methods achieving promising performance (e.g., ROC AUC up to 0.75 for lung cancer and 0.70 for hepatitis) while revealing significant gaps in robustness under drift and limited data. By releasing the dataset, splits, and code, we provide a foundational resource to advance research into robust, generalizable machine learning for clinical breathomics.
Primary Area: datasets and benchmarks
Submission Number: 24960
Loading