Abstract: A system that is able to automatically transpose an audio recording would have many potential applications, from music production to hearing aid design. We present a deep learning approach to transpose an audio recording directly from the raw time domain signal. We train recurrent neural networks with raw audio samples of simple waveforms (sine, square, triangle, sawtooth) covering the linear range of possible frequencies. We examine our generated transpositions for each musical semitone step size up to the octave and compare our results against two popular pitch shifting algorithms. Although our approach is able to accurately transpose the frequencies in a signal, these signals suffer from a significant amount of added noise. This work represents exploratory steps towards the development of a general deep transposition model able to quickly transpose to any desired spectral mapping.
External IDs:doi:10.1007/978-3-031-29956-8_22
Loading