Abstract: With the emerging and continuous development of pre-trained language models, prompt-based training has become a well-adopted paradigm that drastically improves the exploitation of models for many NLP tasks. Prompting also shows great performance compared to traditional finetuning when adapted to zero-shot or few-shot scenarios where the number of annotated data is limited. In this framework, verbalizers play an important role in interpreting masked word distributions produced by language models into output predictions. In this work, we propose MaVEN, a new approach for verbalizer construction by enrichment of class labels using neighborhood relation in the embedding space of words. In addition, we elaborate a benchmarking procedure to evaluate typical baselines of verbalizers for document classification in few-shot learning contexts. Our model achieves state-of-the-art results while using significantly fewer resources. We show that our approach is particularly effective in cases with extremely limited supervision data. Our code is available at {https://anonymous.4open.science/r/verbalizer_benchmark-66E6}.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English, French
0 Replies
Loading