Keywords: Controllable Audio Generation, Timbre Enhancement, Neural Audio Effects, Wave-U-Net, Conditional Generation, Style Modeling
TL;DR: We introduce TimbrePalette, a conditional Wave-U-Net trained using "Style Anchors"—a novel paradigm where high-quality DSP chains define subjective aesthetics, allowing the model to controllably enhance audio timbre and smoothly blend between styles.
Abstract: The growing accessibility of music creation tools and the rise of AI music generation models have led to an increasing demand for efficient, high-quality, and user-friendly tools for audio timbre enhancement. However, traditional Digital Signal Processing (DSP) effect chains often lack content-awareness, while naive deep learning approaches frequently face training instability when directly imitating complex audio effects. To address these challenges, we propose TimbrePalette, an innovative, controllable multi-style timbre enhancement model based on a conditioned Wave-U-Net. Our research begins with a systematic investigation into the stability challenges inherent in waveform-to-waveform generation tasks, establishing a robust training framework with a stable loss function and advanced model architecture. Based on this framework, we introduce a novel paradigm: first, we design and implement three high-quality DSP algorithms representing distinct perceptual dimensions ("Fullness", "Warmth", "Layeredness") to serve as "Style Anchors". Then, we train a single, unified TimbrePalette model to learn the generation of corresponding enhanced audio based on an explicit style command. Comprehensive objective evaluations demonstrate that our single model not only reproduces the target styles with high fidelity but also significantly outperforms both specialized single-style models and strong time-domain baselines, including Conv-TasNet. Furthermore, we quantitatively show the model's ability to smoothly "blend" between styles, proving that it has learned a meaningful and continuous latent space of timbre aesthetics. TimbrePalette offers a powerful, efficient, and creative solution for quality improvement for both musicians and creators working with AI-generated content.
Primary Area: generative models
Submission Number: 5048
Loading