Causal Transformers: Improving the Robustness on Spurious CorrelationsDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: The fully-connected dependencies in self-attention over-fit spurious correlations and limit the generalization on out-of-distribution data. Pre-trained language models (PLMs) alleviate this problem benefitted from the appreciable counterexamples in large-scale pre-training corpora. However, there is no study to resolve this problem by improving the model structure. We enforced the causal independence mechanism in the self-attention network, which constrains attention mapping topologies (AMGs) as causal structures. To implement it, we defined a smooth loss on the Markov boundary constrained directed acyclic graph (DAG) with the Lagrange duality, and used it to optimize the AMGs towards causal structures. Further, this causal attention network was applied on Transformer (Causal Transformer). The empirical results on two spurious correlation challenging (SCC) datasets, neural machine translation (NMT) and natural language inference (NLI) tasks demonstrated that our Causal Transformer outperforms the state-of-the-art model and improves the out-of-distribution prediction.
0 Replies

Loading