Abstract: This paper introduces a polyphonic piano transcription framework that can robustly transcribe piano notes from real-world audio recordings in real time. State-of-the-art (SoTA) automatic music transcription (AMT) methods lack real-time processing capabilities and generalization to unseen recording conditions, making them difficult to deploy in a real-world scenario. To address these challenges, we propose mobile-AMT, a new AMT framework that consists of 1) an online and lightweight network architecture with efficient recurrent and convolutional layers, and 2) a data augmentation scheme to enhance robustness against out-of-domain recordings. The mobile-AMT model reduces the computational cost by 82.9% while retaining comparable accuracy to the recent SoTA method, allowing real-time AMT on mobile devices. The proposed augmentation improves the note F1-score by 14.3 points when evaluated on realistic audio with various recording conditions.
Loading