VE-KD: a method for training smaller language models adapted to specific domains

Anonymous

VE-KD: a method for training smaller language models adapted to specific domains

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We propose VE-KD, a novel method juggling knowledge distillation and vocabulary expansion to train efficient domain-specific language models.

Abstract: We propose VE-KD, a novel method juggling knowledge distillation and vocabulary expansion to train efficient domain-specific language models. In comparison with traditional pre-training approaches, VE-KD provides competitive performance in downstream tasks while reducing model size and required computational resources. Our experiments with different biomedical domain tasks demonstrate that VE-KD performs well compared with models such as BioBERT (+1% at HoC) and PubMedBERT (+1% at PubMedQA), with about 96% reduced training time. Furthermore, it outperforms DistilBERT, and offers a significant improvement in document-level tasks. Investigation of vocabulary size and tolerance, which are hyperparameters of our method, provides insights for further model optimization. The fact that VE-KD consistently maintains its advantages even when the corpus size is small suggests that it is a practical approach for domain-specific language tasks, and is transferrable to different domains for broader applications.

Paper Type: long

Research Area: Efficient/Low-Resource Methods for NLP

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

0 Replies

Loading