BackSlash: Rate Constrained Optimized Training of Large Language Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Abstract: The rapid advancement of large-language models (LLMs) has driven extensive research into parameter compression after training has been completed, yet compression during the training phase remains largely unexplored. In this work, we introduce Rate-Constrained Training (BackSlash), a novel training-time compression approach based on rate-distortion optimization (RDO). BackSlash enables a flexible trade-off between model accuracy and complexity, significantly reducing parameter redundancy while preserving performance. Experiments in various architectures and tasks demonstrate that BackSlash can reduce memory usage by 60\% - 90\% without accuracy loss and provides significant compression gain compared to compression after training. Moreover, BackSlash proves to be highly versatile: it enhances generalization with small Lagrange multipliers, improves model robustness to pruning (maintaining accuracy even at 80\% pruning rates), and enables network simplification for accelerated inference on edge devices.
Lay Summary: In this work, we introduces a new method called Rate-Constrained Training (BackSlash) that makes AI models smaller and more efficient while they are being trained, rather than after. Unlike traditional approaches that shrink models post-training, RCT optimizes the balance between model size and performance from the start. Tests show it can cut memory usage by 60–90% without losing accuracy, while also making models more adaptable, robust to pruning (e.g., removing 80% of unnecessary parts), and faster for edge devices. This could help deploy advanced AI on low-power gadgets more easily.
Primary Area: Deep Learning->Large Language Models
Keywords: Model Compression, Rate-Distortion Optimization, Entropy Encoding
Submission Number: 11395
Loading