BHAVNET: Efficient Knowledge Transfer from Large Knowledge Models to Task-Specific Architectures through Features
Keywords: GCN, Bhav, Interpretibility
TL;DR: Knowledge from complex models can be transferred to different architectures, such as from BERTs to GCNs, and GCNs in a dual encoder space efficiently model semantics of antonyms and synonyms
Abstract: While large models (LLMs) like BERT achieve remarkable performance across diverse NLP tasks, their computational demands limit practical deployment. Traditional knowledge distillation approaches compress these models but often require the original architecture, limiting architectural innovation for specific tasks. We propose a feature-based knowledge transfer framework that leverages fine-tuned BERT representations to initialize and guide specialized student architectures, enabling both efficiency gains and architectural flexibility. Using multilingual antonym-synonym distinction as our testbed, we demonstrate how task-specific dual-encoder networks can be effectively initialized with BERT features and trained independently, achieving superior performance compared to BERT baselines while requiring only 20 minutes of training versus 4 hours from scratch on an NVIDIA RTX 4060. Our approach shows improvements across eight languages, with student models performing similarly to their BERT teachers while using significantly fewer parameters. This framework opens new possibilities for transferring knowledge from large models to innovative architectures without architectural constraints.
Primary Area: interpretability and explainable AI
Submission Number: 3460
Loading