Graph-Enhanced Transformer Architecture with Novel Use of CEFR Vocabulary Profile and Filled Pauses in Automated Speaking Assessment

Jiun-Ting Li, Tien-Hong Lo, Bi-Cheng Yan, Yung-Chang Hsu, Berlin Chen

Published: 01 Jan 2023, Last Modified: 30 Jul 2025SLaTE 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning (DL)-based approaches, such as LSTM and Transformer, have shown remarkable advancements in automated speaking assessment (ASA). Nevertheless, two challenges persist: faithful modeling of hierarchical context, such as how to portray word-to-paragraph relationships, and seamless integration of hand-crafted knowledge into DL-based model. In this work, we propose utilizing heterogeneous graph neural networks (HGNNs) as the backbone model to handle hierarchical context effectively. Furthermore, to enhance node embeddings in the HGNN, we integrate external knowledge from spoken content, such as text-based features (vocabulary profile) and speech-based features (filled pauses). Experimental results on the NICT JLE corpus validate the efficacy of our approach, achieving superior performance over the existing Transformer-based language models. Our findings also highlight the utility of our method in accurately evaluating speaking proficiency, showcasing its practical promise.