Diff-Pitcher: Diffusion-Based Singing Voice Pitch Correction

Jiarui Hai, Mounya Elhilali

Published: 01 Jan 2023, Last Modified: 13 Feb 2025WASPAA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Pitch correction is the process of adjusting the original pitch of a recording or live performance in order to fit it to specific key or match a target profile. Pitch correction systems typical consist of several stages: original pitch estimation, pitch curve modification, and resynthesis of the audio with the target pitch curve. Unfortunately, the process of resynthesis often leads to significant artifacts that degrade the overall quality of the modified audio, rendering it unnatural and unpleasant. In this work, we introduce Diff-Pitcher1, a pitch control model that leverages diffusion modeling and source-filter mechanisms to generate high-quality and natural-sounding voice signal matched to a target pitch while ensuring content and timbre consistency. To demonstrate the effectiveness of the proposed method, we evaluate Diff-Pitcher by both subjective and objective experiments in scenarios of pitch shifting and automatic pitch correction. Our results show that Diff-Pitcher outperforms previous pitch control methods in sound-quality and naturalness with great pitch controllability. Furthermore, we apply Diff-Pitcher in template-based and score-based automatic pitch correction systems and explore their application potentials. Meanwhile, for score-based automatic pitch correction, we improve the pitch predictor proposed in KaraTuner to handle variable-length inputs.