Keywords: Industrial sound dataset, Machine monitoring, Manufacturing, Self-supervised learning, Pretrainedmodel
TL;DR: We introduce an industrial sound datasets and benchmarks with a reference pretrained model.
Abstract: Industrial acoustic signals encode machine state, yet prevailing data-driven approaches are task-specific supervised pipelines that generalize poorly beyond their design conditions. Progress is further limited by the scarcity of large-scale datasets and pretrained models tailored to active shop floor audio. To address this, we introduce DINOS (Diverse INdustrial Operation Sounds), a dataset of 74,149 recordings totaling over 1,093 hours collected from active manufacturing lines across diverse processes and operating regimes. We also provide IMPACT(Industrial Machine Perception via Acoustic Cognitive Transformer), a reference model pretrained on DINOS to standardize evaluation. Our benchmark is structured in four machine-specific steps: (1) baseline discrimination, (2) moderate operational complexity, (3) scalability to unseen equipment, and (4) domain shift and sensor modality adaptation. Across tasks, models pretrained or fine-tuned on DINOS consistently outperform general-purpose audio models, demonstrating the value of domain-specific pretraining for industrial acoustic perception.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 19409
Loading