Deformable Self-Attention for Text Classification

Qianli Ma, Jiangyue Yan, Zhenxi Lin, Liuhong Yu, Zipeng Chen

2021 (modified: 14 Dec 2021)IEEE ACM Trans. Audio Speech Lang. Process. 2021Readers: Everyone

Abstract: Text classification is an important task in natural language processing. Contextual information is essential for text classification, and different words usually need different sizes of contextual information. However, most existing methods learn contextual features with predefined fixed sizes, which cannot extract the different sizes of contextual features for different words. To this end, we propose a new model named Deformable Self-Attention (DSA) to flexibly learn word-specific contextual features, rather than extracting features of fixed context sizes. Our model is mainly composed of a Deformable Local Attention Weight Generation (DLAWG) module and a Multi-Range Feature Integration (MRFI) module. The DLAWG module can adaptively determine different context sizes for different words within a particular range and then learn word-specific contextual features for each word. DLAWG then employs multiple ranges to capture context dependencies of different ranges. After that, the MRFI module integrates features from different ranges by considering the interactions with features of different ranges, which can delete irrelevant features while enhancing discriminative ones. Experiments on extensive benchmark datasets and visualizations illustrate the effectiveness of our model.

0 Replies