- Keywords: Transformers, lexical overlap bias, predicate-argument structures
- TL;DR: Enhancing the robustness of pretrained transformer models against the lexical overlap bias by extending the input sentences of the training data with their corresponding predicate-argument structures
- Abstract: Recent pretrained transformer-based language models have set state-of-the-art performances on various NLP datasets. However, despite their great progress, they suffer from various structural and syntactic biases. In this work, we investigate the lexical overlap bias, e.g., the model classifies two sentences that have a high lexical overlap as entailing regardless of their underlying meaning. To improve the robustness, we enrich input sentences of the training data with their automatically detected predicate-argument structures. This enhanced representation allows the transformer-based models to learn different attention patterns by focusing on and recognizing the major semantically and syntactically important parts of the sentences. We evaluate our solution for the tasks of natural language inference and grounded commonsense inference using the BERT, RoBERTa, and XLNET models. We evaluate the models' understanding of syntactic variations, antonym relations, and named entities in the presence of lexical overlap. Our results show that the incorporation of predicate-argument structures during fine-tuning considerably improves the robustness, e.g., about 20pp on discriminating different named entities, while it incurs no additional cost at the test time and does not require changing the model or the training procedure.