Abstract: Singing techniques are important skills for a professional vocal performance that usually involves dedicated fluctuations of timbre, pitch, duration, and loudness, etc. To recognize types of singing techniques can be quite challenging because 1) the time-frequency features in singing are highly dynamic that may appear in a long range of audio signals; 2) different singing techniques such as vibrato and trill tend to have similar features in the locality; 3) The distribution of singing technique dataset suffers from the long-tailed issue. To man-age these problems, we proposed a novel Radial Attention Transformer (RAT) with a Radial Attention (RA) Module that can capture the fine-grained local features as well as the long range inter-dependency of audio features. The experiment results showed that the proposed method, RAT with Adaptive Logit Adjustment (ALA) Loss significantly outperformed pre-vious state-of-the-art models (Convolutional Neural Networks and Deformable CNN), on the recognition tasks of singing technique categories.
0 Replies
Loading