Abstract: State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with constrained resources. To overcome this challenge, we investigate pre-training strategies to learn an intermediate representation suitable for the keyphrase generation task. We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives that condense the domain-specific knowledge essential for keyphrase generation. Through experiments on benchmarks spanning multiple domains, we show the effectiveness of the proposed approaches for facilitating low resource and zero-shot keyphrase generation.
Paper Type: short
0 Replies
Loading