TL;DR: We propose VE-KD, a novel method juggling knowledge distillation and vocabulary expansion to train efficient domain-specific language models.
Abstract: We propose VE-KD, a novel method juggling knowledge distillation and vocabulary expansion
to train efficient domain-specific language models.
In comparison with traditional pre-training approaches,
VE-KD provides competitive performance in downstream tasks
while reducing model size and required computational resources.
Our experiments with different biomedical domain tasks demonstrate
that VE-KD performs well compared with models such as BioBERT (+1% at HoC) and PubMedBERT (+1% at PubMedQA),
with about 96% reduced training time.
Furthermore,
it outperforms DistilBERT,
and offers a significant improvement in document-level tasks.
Investigation of vocabulary size and tolerance,
which are hyperparameters of our method,
provides insights for further model optimization.
The fact that VE-KD consistently maintains its advantages
even when the corpus size is small suggests that it is a practical approach for domain-specific language tasks,
and is transferrable to different domains for broader applications.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
0 Replies
Loading