IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

ICLR 2026 Conference Submission19409 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Industrial sound dataset, Machine monitoring, Manufacturing, Self-supervised learning, Pretrainedmodel

TL;DR: We introduce an industrial sound datasets and benchmarks with a reference pretrained model.

Abstract: Industrial acoustic signals encode machine state, yet prevailing data-driven approaches are task-specific supervised pipelines that generalize poorly beyond their design conditions. Progress is further limited by the scarcity of large-scale datasets and pretrained models tailored to active shop floor audio. To address this, we introduce DINOS (Diverse INdustrial Operation Sounds), a dataset of 74,149 recordings totaling over 1,093 hours collected from active manufacturing lines across diverse processes and operating regimes. We also provide IMPACT(Industrial Machine Perception via Acoustic Cognitive Transformer), a reference model pretrained on DINOS to standardize evaluation. Our benchmark is structured in four machine-specific steps: (1) baseline discrimination, (2) moderate operational complexity, (3) scalability to unseen equipment, and (4) domain shift and sensor modality adaptation. Across tasks, models pretrained or fine-tuned on DINOS consistently outperform general-purpose audio models, demonstrating the value of domain-specific pretraining for industrial acoustic perception.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 19409

Loading