Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training

ACL ARR 2024 June Submission1713 Authors

14 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose an end-to-end differentiable training paradigm for stable training of a rationalized transformer classifier. Our approach results in a single model that simultaneously classifies a sample and scores input tokens based on their relevance to the classification. To this end, we build on the widely-used three-player-game for training rationalized models, which typically relies on training a rationale selector, a classifier and a complement classifier. We simplify this approach by making a single model fulfill all three roles, leading to a more efficient training paradigm that is not susceptible to the common training instabilities that plague existing approaches. Further, we extend this paradigm to produce class-wise rationales while incorporating recent advances in parameterizing and regularizing the resulting rationales, thus leading to substantially improved and state-of-the-art alignment with human annotations without any explicit supervision.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: explanation faithfulness, feature attribution, counterfactual/contrastive explanations, self-supervised learning, passage retrieval

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 1713

Loading