Multi-features Enhanced Multi-task Learning for Vietnamese Treebank Conversion

Published: 01 Jan 2024, Last Modified: 17 May 2025CCL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Pre-trained language representation-based dependency parsing models have achieved obvious improvements in rich-resource languages. However, these model performances depend on the quality and scale of training data significantly. Compared with Chinese and English, the scale of Vietnamese Dependency treebank is scarcity. Considering human annotation is labor-intensive and time-consuming, we propose a multi-features enhanced multi-task learning framework to convert all heterogeneous Vietnamese Treebanks to a unified one. On the one hand, we exploit Tree BiLSTM and pattern embedding to extract global and local dependency tree features from the source Treebank. On the other hand, we propose to integrate these features into a multi-task learning framework to use the source dependency parsing to assist the conversion processing. Experiments on the benchmark datasets show that our proposed model can effectively convert heterogeneous treebanks, thus further improving the Vietnamese dependency parsing accuracy by about 7.12 points in LAS.
Loading