Curriculum Learning Driven Domain Adaptation for Low-Resource Machine Reading Comprehension

Published: 01 Jan 2024, Last Modified: 21 Feb 2025IEEE Signal Process. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Although the pre-trained language models have achieved great success on machine reading comprehension task, they often rely on large-scale annotated data, while only a little amount of data is available in the most real-world scenarios. To enhance the PTLMs' capabilities in low-resource scenario, we propose a curriculum learning driven domain adaptation method for low-resource machine reading comprehension, the basic paradigm of which is to train a source model with sufficient data and then adaptive it to our target domain. In the adapting procedure, we introduce the curriculum learning strategy, the core idea of which is arranging training examples from easy to difficult, to bridge the gap between source and target domains and enable the source model adapting to the target domain progressively. Specifically, before fine-tuning the well-trained source model using target data, we firstly calculate the loss of each target example using the source model to evaluating the example difficulty accurately. After that, we sample suitable batches based on an increasing sampling function at each fine-tuning step, allowing the source model to start learning from easy examples in the target domain and gradually transition to difficult ones. Experiments conducted on two public datasets have demonstrated the effectiveness of our method.
Loading