MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Intermediate Training on an Uncurated Collection of Datasets
Abstract: To improve the ability of language models to
handle Natural Language Processing (NLP)
tasks and intermediate step of pre-training has
recently been introduced. In this setup, one
takes a pre-trained language model, trains it on
a (set of) NLP dataset(s), and then finetunes
it for a target task. It is known that the selection of relevant transfer tasks is important, but
recently some work has shown substantial performance gains by doing intermediate training
on a very large set of datasets. Most previous
work uses generative language models or only
focuses on one or a couple of tasks and uses
a carefully curated setup. We compare intermediate training with one or many tasks in a
setup where the choice of datasets is more arbitrary; we use all SemEval 2023 text-based tasks.
We reach performance improvements for most
tasks when using intermediate training. Gains
are higher when doing intermediate training
on single tasks than all tasks if the right transfer task is identified. Dataset smoothing and
heterogeneous batching did not lead to robust
gains in our setup.
0 Replies
Loading