Abstract: Singing melody extraction serves as an important foundation in the realm of music information retrieval (MIR). Although fully convolutional neural networks (CNNs) are commonly employed for singing melody extraction, they are constrained by inductive biases and face challenges in establishing long range dependency. Transformer-based networks have better performance, but the computational load is high. Recently, many multi-layer perceptron (MLP) architectures have been applied for a variety of computer vision tasks, demonstrating competitive performance. However, its potential ability in the task of singing melody extraction remains to be further explored. In this paper, we propose the lightweight convolutional MLP (LcMLP), an ultra lightweight model without sacrificing the performance. Firstly, we improve the original MLP-Mixer. We change the sequential MLPs to parallel ones and add some skip connections. Secondly, we propose a multi-level convolution fusion module that facilitates the interaction of features at various depths in MLP-Mixer. We conducted extensive experiments on several well-known public datasets, and our model demonstrates significant advantages in inference speed and computational load, while also achieving competitive performance.
Loading