Tailor: Generating and Perturbing Text with Semantic Controls

Anonymous

Tailor: Generating and Perturbing Text with Semantic Controls

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Controlled text perturbation is useful for evaluating model generalizability and improving model robustness to dataset artifacts. However, current techniques rely on training a perturbation model for every targeted attribute, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model, and produces textual outputs conditioning on $\textbf{control codes}$ derived from semantic representations. We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes. These operations can be further composed into higher-level ones, allowing for flexible perturbation strategies. Tailor can be applied in various scenarios. We use it to automatically create high-quality contrast sets for four distinct natural language processing (NLP) tasks. These contrast sets contain fewer spurious biases and are complementary to manually annotated ones in terms of lexical diversity. We show that Tailor helps improve model generalization through data augmentation, with a 5.8-point gain on an NLI challenge set, by perturbing just $\sim2\%$ of training data.

0 Replies

Loading