∇QDARTS: Quantization as an Elastic Dimension to Differentiable NAS

Payman Behnam; Uday Kamal; Sanjana Vijay Ganesh; Zhaoyi Li; Michael Andrew Jurado; Alind Khare; Igor Fedorov; Gaowen Liu; Alexey Tumanov

∇QDARTS: Quantization as an Elastic Dimension to Differentiable NAS

Payman Behnam, Uday Kamal, Sanjana Vijay Ganesh, Zhaoyi Li, Michael Andrew Jurado, Alind Khare, Igor Fedorov, Gaowen Liu, Alexey Tumanov

Published: 29 Apr 2025, Last Modified: 29 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Differentiable Neural Architecture Search methods efficiently find high-accuracy architectures using gradient-based optimization in a continuous domain, saving computational resources. Mixed-precision search helps optimize precision within a fixed architecture. However, applying it to a NAS-generated network does not assure optimal performance as the optimized quantized architecture may not emerge from a standalone NAS method. In light of these considerations, this paper introduces ∇QDARTS, a novel approach that combines differentiable NAS with mixed-precision search for both weight and activation. ∇QDARTS aims to identify the optimal mixed-precision neural architecture capable of achieving remarkable accuracy while operating with minimal computational requirements in a single-shot, end-to-end differentiable framework, obviating the need for pretraining and proxy methods. Compared to fp32, ∇QDARTS shows impressive performance on CIFAR10 with (2,4) bit precision, reducing bit operations by 160× with a slight 1.57% accuracy drop. Increasing the capacity enables ∇QDARTS to match fp32 accuracy while reducing bit operations by 18×. For the ImageNet dataset, with just (2,4) bit precision, ∇QDARTS outperforms state-of-the-art methods such as APQ, SPOS, OQA, and MNAS by 2.3%, 2.9%, 0.3%, and 2.7% in terms of accuracy. By incorporating (2,4,8) bit precision, ∇QDARTS further minimizes the accuracy drop to 1% compared to fp32, alongside a substantial reduction of 17× in required bit operations and 2.6× in memory footprint. In terms of bit-operation (memory footprint) ∇QDARTS excels over APQ, SPOS, OQA, and MNAS with similar accuracy by 2.3× (12×), 2.4× (3×), 13% (6.2×), 3.4× (37%), for bit-operation (memory footprint), respectively. ∇QDARTS enhances the overall search and training efficiency, achieving a 3.1× and 1.54× improvement over APQ and OQA, respectively.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/gatech-sysml/NablaQDARTS

Assigned Action Editor: ~Naigang_Wang1

Submission Number: 3991

Loading