Abstract: For text classification, the traditional attention mechanism usually pays too much attention to words that appear frequently and needs a lot of labeled data for learning a good distribution. Introducing human attention is a classical method, but it needs a high cost of manual labeling. This paper proposes a perturbation-based self-supervised attention approach to guide attention learning without any annotation overheads. Specifically, we add as much noise as possible to all the words in the sentence simultaneously while without changing their semantics and predictions. According to the words that tolerate more noise are supposed to be less significant, we can obtain attention supervision information and utilize it to refine the attention distribution. Experimental results on three text classification tasks show that our approach can significantly promote the performance of current attention-based models and is more effective than existing self-supervised methods. We also provide visualization analysis to verify the effectiveness of our approach.
0 Replies
Loading