DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation

Anonymous

DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Natural language processing (NLP) algorithms have become very successful, but they still struggle when applied to out-of-distribution examples. In this paper we propose a controllable generation approach in order to deal with this domain adaptation (DA) challenge. Given an input text example, our DoCoGen algorithm generates a domain-counterfactual textual example (D-CON) – that is similar to the original in all aspects, including the task label, but its domain is changed to a desired one. Importantly, DoCoGen is trained using only unlabeled examples from multiple domains – no NLP task labels or pairs of textual examples and their domain-counterfactuals are required.We use the D-CONs generated by DoCoGen to augment a sentiment classifier in 20 DA setups, where source-domain labeled data is scarce. Our model outperforms strong baselines and improves the accuracy of a state-of-the-art unsupervised DA algorithm.

0 Replies

Loading