Request and complaint recognition in call-center speech using a pointwise-convolution recurrent network
Abstract: The task of request and complaint recognition in call-center speech aims to identify both intention and emotion states for call speakers, through analysing the paralinguistic information conveyed by spoken signals. Nevertheless, existing related works fail to make full use of the fusion for the multi-layer representations derived from foundation models, and further, these works usually include insufficient sequential encoding for temporal information in speech. Specifically for recognising requests and complaints in call-center speech, we propose an approach using a PointWise-Convolution Recurrent network (PWCR) in this paper. Within the proposed approach, we first propose the pointwise-convolution module to perform layer-wise aggregation for the representations, from the multiple Transformer layers contained in a pre-trained foundation model. Then, a recurrent module is employed to capture effective temporal and contextual information, through a recurrent layer with multi-head self-attention. Subsequently, the experimental results on the HealthCall30 Corpus for request and complaint recognition in call-center speech indicate that, the proposed approach can achieve better recognition performance, compared with state-of-the-art approaches, resulting in unweighted average recalls of \(78.7\%\) (maximum) / \(77.3\%\) (average) and \(60.6\%\) (maximum) / \(59.8\%\) (average) for the request and complaint tasks, respectively.
Loading