Abstract: Recent research in deep learning has focused on various large models that excel in many non-industrial environments. However, deploying these models in practical applications often presents significant computational and storage challenges. The research community commonly uses knowledge distillation methods to optimize the spatial dimensions of large models to meet industrial requirements. Nevertheless, knowledge distillation typically separates the processes of knowledge transfer and downstream adaptation. To solve this, we propose Parameter-Diminish Fine-Tuning (PDFT), a technique that compresses Transformer-based large models during fine-tuning, enabling initial lightweighting on downstream datasets without significantly sacrificing performance. We further introduce a Probabilistic Stepping Replacement (PSR) method and an advanced training schedule to enhance the performance of PDFT. Our PDFT allows the compression of SAM and BERT models to parameter levels acceptable for computation-constrained devices based on specific needs. Experiments conducted on SAM and BERT models validate the versatility and effectiveness of our PDFT. For both models, our PDFT achieves up to 0.93% and 1.6% accuracy improvements, respectively. Released code is available at https://github.com/zhang-mu-yang/PDFT.
External IDs:dblp:journals/vc/ZhangMJGSWXMZ25
Loading