Abstract: In recent years, machine learning models for automated speech scoring systems were mainly built using data-driven approaches with handcrafted features as one of the main components. However, the remarkable successes of deep learning (DL) technology in a variety of machine learning tasks has demonstrated its effectiveness in extracting features. Although there have been some efforts in utilizing DL technology for the automated speech scoring task, a thorough investigation of learning useful features is still missing. In this paper, we propose an end-to-end solution that consists of using deep neural network models to encode both lexical and acoustical cues to learn predictive features automatically. Experiments also confirm the effectiveness of our proposed solution compared to conventional methods based on handcrafted features.
0 Replies
Loading