An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding
Abstract: This paper aims to develop a Chinese text classification workflow in education situations, where a grade can swing due to subjective cognitive loads. This problem is often observed between the academic paper comments and their grades, leading to a challenge in Chinese texts. To analyze this problem, we in this paper introduce an effective Chinese text classifier by extending the popular seed words-based model into an effective workflow. We first made texts into vectors in the proposed method using Chinese preprocessing. We then exploited the bidirectional encoder representations from Transformers to integrate the contextualization features, then performed a hierarchical attention network for classification. In this study, we collected 4,310 review comment short-texts involving 140 universities in China. As these texts include noisy grades from experts, the proposed method yields seed words for each category, resulting in pseudo labels to weakly supervise the network training instead of the noisy labels. We finally evaluated the designed workflow on the real-world datasets and achieved a good performance in Chinese classification compared with the traditional models. This study provides insights into a real educational text case where a review grade can swing due to subjective cognitive loads and an available workflow to automatically grade these Chinese expert comment texts, facilitating the precise academic evaluation system.
Loading