Repeated Integer Linear Programming for Bit Selection in neural network quantization

Shinya Gongyo; Ryosuke Ogasawara; Tatsuya Moe; Masafumi Mori; Yusuke Sekikawa; Mitsuru Ambai

Repeated Integer Linear Programming for Bit Selection in neural network quantization

Shinya Gongyo, Ryosuke Ogasawara, Tatsuya Moe, Masafumi Mori, Yusuke Sekikawa, Mitsuru Ambai

17 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural network quantization, mixed precision, bit selection, model compression

TL;DR: We propose an efficient algorithm for bit selection using repeated integer linear programming.

Abstract: Network quantization methods, which have been widely studied to reduce model size and computational cost, are now becoming well established as practical solutions. Mixed-precision quantization, which assigns optimal bit widths to layers, blocks, or other substructures, offers a promising approach to balance model performance and efficiency. However, determining the optimal bit configuration is a challenging combinatorial optimization problem, as it requires selecting discrete bit widths for multiple substructures across the network. In this paper, we propose an efficient algorithm that approximates the problem as an integer linear program and iteratively explores the bit-configuration space. Our method utilizes a small set of unlabeled samples with a low computational overhead, making it compatible with both widely adopted quantization methods: post-training quantization and quantization-aware training. We demonstrate the effectiveness of our approach in both settings, consistently achieving superior performance compared to single-precision baselines and existing bit-selection methods. The code will be released upon acceptance.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8667

Loading