Clustering-Based Knowledge Distillation with Sentence Pruning Processing for Efficient Student Model Training

ACL ARR 2025 February Submission5995 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large-scale pre-trained language models, such as BERT, RoBERTa, and GPT, achieve state-of-the-art performance across various NLP tasks but face significant deployment challenges in resource-constrained environments due to high computational demands. To address this, we propose Clustering-Based Knowledge Distillation with Sentence Pruning Processing, a novel framework that enhances knowledge transfer by integrating multiple teacher models and refining sentence-level representations. Our approach employs cosine similarity measurement to identify cluster centers based on the strongest edge weights, ensuring that the most informative sentences are preserved. Additionally, clustering-based pruning with dynamic thresholds, guided by TF-IDF-based importance scores, effectively removes redundant information while retaining critical knowledge. By aggregating diverse knowledge from multiple teachers, Clustering-Based Knowledge Distillation with Sentence Pruning Processing enhances model robustness and generalization. Experimental results on multiple NLP benchmarks demonstrate that our method outperforms existing knowledge distillation techniques, achieving higher accuracy while significantly reducing computational overhead and inference time. Notably, our framework achieves up to 23× speedup on SST-2 and improves RTE accuracy by 4.5\%, demonstrating its effectiveness for efficient NLP model deployment.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Natural Language Processing, Information Extraction, Language Modeling, Efficient and Low-Resource NLP, Knowledge Distillation, Clustering Techniques, Sentence Pruning, Model Compression
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 5995
Loading