Keywords: Controlled Synthesis, Concept Learning, Disentangled Audio Representations, Factorized Representations, Generative Models, Signal Processing, Component Analysis, Neural Networks, VAEs
Abstract: This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce **SynTone**, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.
Submission Number: 122
Loading