QA Domain Adaptation using Data Augmentation and Contrastive Adaptation

Anonymous

QA Domain Adaptation using Data Augmentation and Contrastive Adaptation

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Domain adaptation for question answering (QA) has recently shown impressive results for answering out-of-domain questions. Yet, a common challenge is to build approaches that are effective for niche domains with small text corpora. In this paper, we propose a novel framework called QADA for QA domain adaptation. QADA has two components: (1) A question generation model is used to generate synthetic question-answer samples from the target domain. Different from existing baselines, we enrich the samples via a novel pipeline for data augmentation: for questions, we introduce token-level augmentation (i.e., synonym replacement and token swapping), and, for contexts, we develop hidden-space augmentation which learns to drop context spans via a custom attentive sampling strategy. (2) The QA model is based on transformers. However, unlike existing approaches, we propose to train it via a novel attention-based contrastive adaptation. Here, we use the attention weights to sample informative tokens for discrepancy estimation that helps the QA model separate answers and generalize across source and target domain. To the best of our knowledge, our work is the first in QA domain adaptation to leverage data augmentation and attention-based contrastive adaptation. Our evaluation shows that QADA achieves considerable improvements over state-of-the-art baselines for QA domain adaptation.

0 Replies

Loading