VE-KD: Vocabulary-Expansion Knowledge-Distillation for Training Smaller Domain-Specific Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: We propose VE-KD, a novel method that balances knowledge distillation and vocabulary expansionwith the aim of training efficient domain-specific language models.Compared with traditional pre-training approaches,VE-KD exhibits competitive performance in downstream taskswhile reducing model size and using fewer computational resources.Additionally, VE-KD refrains from overfitting in domain adaptation.Our experiments with different biomedical domain tasks demonstratethat VE-KD performs well compared with models such as BioBERT (+1% at HoC) and PubMedBERT (+1% at PubMedQA),with about 96% less training time. Furthermore, it outperforms DistilBERT and Adapt-and-Distill,showing a significant improvement in document-level tasks. Investigation of vocabulary size and tolerance,which are hyperparameters of our method,provides insights for further model optimization. The fact that VE-KD consistently maintains its advantages,even when the corpus size is small,suggests that it is a practical approach for domain-specific language tasksand is transferrable to different domains for broader applications.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview