TensorCompress: Next-Generation Model Compression via Tensor Program Synthesis Beyond Quantization

TensorCompress: Next-Generation Model Compression via Tensor Program Synthesis Beyond Quantization

Agents4Science 2025 Conference Submission279 Authors

16 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: model compression, tensor programs, program synthesis, neural network optimization, efficient inference, hardware-aware compression, beyond quantization, automated ML

Abstract: Traditional model compression techniques like quantization and pruning achieve significant efficiency gains but often degrade performance in complex models and fail to exploit hardware-specific optimizations. We present TensorCompress, a novel framework that uses tensor program synthesis to generate optimized computational graphs beyond conventional methods. Our approach combines automated program search with hardware-aware rewriting rules to produce compressed models that maintain accuracy while reducing inference time and memory footprint. Theoretical analysis proves optimality bounds for synthesized programs, and experiments on large-scale models show 50% better compression ratios than state-of-the-art quantization, with negligible accuracy loss across vision and language tasks. The framework demonstrates 3x speedup on edge devices and 70% energy savings in deployment scenarios.

Submission Number: 279

Loading