Keywords: natural language processing, nlp, language generation
Abstract: Conditional set generation learns a mapping from an input sequence of tokens to a set. Several popular natural language processing (NLP) tasks, such as entity typing and dialogue emotion tagging, are instances of set generation. Sequence-to-sequence models are a popular choice to model set generation but this typical approach of treating a set as a sequence does not fully leverage its key properties, namely order-invariance and cardinality. We propose a novel data augmentation approach that recovers informative orders for labels using their dependence information. Further, we jointly model the set cardinality and output by listing the set size as the first element and taking advantage of the autoregressive factorization used by seq2seq models. Our experiments in simulated settings and on three diverse NLP datasets show that our method improves over strong seq2seq baselines by about 9% on absolute F1 score. We will release all code and data upon acceptance.
One-sentence Summary: We propose a method for generating sets using sequence generation models by converting a set to a sequence in informative orders, and jointly modeling cardinality.
Supplementary Material: zip
10 Replies
Loading