Abstract: Document classification is a classical and fundamental text mining problem for many applications. In such classifiers, text representation is an intermediate step, but it plays an important role in building the models. Recently, graph neural networks have been shown to be a potential method for text presentation since they not only have a rich relational structure but also preserve global word co-occurrence correlation. However, most of them have been proposed for English documents. In this paper, we present a model based on a graph convolutional network for Vietnamese document classification. We first present detailed steps in building the graph from Vietnamese documents and a two-layer GCN architecture for graph embedding. We then propose a method, named PMI filter, to improve the classification accuracy of the model. Furthermore, aspects of the proposed model are also investigated to provide a better understanding of the model behavior. The proposed work is evaluated on two large Vietnamese datasets. In experiments, the proposed model archives better results than its baseline and competitive performance compared to existing feature-selection based methods.
0 Replies
Loading